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Abstract 

In  a  distributed  wireless  system,  such  as  an  ad  hoc  network,  multiple  nodes  behave  cooperatively 
towards  a  common  goal.  For  doing  so,  they  make  decisions  independently  and  cooperatively  such  that 
they  benefit  the  mission  of  the  network.  Though  such  assumptions  on  cooperation  are  desirable  (e.g., 
controlling  the  transmit  power  level,  reducing  interference  for  each  other,  revealing  private  information, 
adhering  to  network  policies)  for  analyzing  and  modeling,  certain  nodes  belonging  to  a  real-world  system 
have  often  shown  to  deviate.  These  nodes,  known  as  misbehaving  nodes,  bring  more  challenges  to  the 
design  of  the  wireless  network  because  the  unreliable  channel  makes  the  actions  of  the  nodes  hidden  from 
each  other. 

In  this  project,  we  analyze  misbehavior  in  wireless  networks  that  have  dire  consequences  on  the  perfor¬ 
mance  of  the  network.  In  particular,  we  analyze  two  types  of  misbehavior,  namely,  selfish  noncooperation 
and  malicious  attacking.  We  apply  game  theoretic  techniques  to  model  the  interactions  among  the  nodes  in 
the  network.  First,  we  consider  a  homogeneous  unreliable  channel  and  analyze  the  necessary  and  sufficient 
conditions  to  enforce  cooperative  packet  forwarding  among  a  node  pair.  We  formulate  an  anti-collusion 
game  and  derive  the  conditions  that  achieve  full  cooperation  when  the  non-cooperative  nodes  collude.  In 
addition,  we  consider  multi-hop  communication  with  a  heterogeneous  channel  model.  We  refine  our  game 
model  as  a  hidden  action  game  with  imperfect  private  monitoring.  A  state  machine  based  strategy  is  pro¬ 
posed  to  reach  Nash  Equilibrium.  The  strategy  attains  cooperative  packet  forwarding  with  heterogeneous 
channel  and  requires  only  partial  and  imperfect  information.  Furthermore,  it  also  enforces  cooperation 
in  multi-hop  packet  forwarding.  To  tackle  the  malicious  attacks,  we  use  Bayesian  game  analysis  to  show 
the  existence  of  equilibrium  in  the  detection  game  and  argue  that  it  might  not  be  profitable  to  isolate 
the  malicious  nodes  upon  detection.  We  propose  the  concept  of  “coexistence  with  malicious  nodes”  by 
proving  the  co-existence  equilibrium  and  derive  the  conditions  that  achieve  the  equilibrium. 

To  validate  and  test  the  proposed  theoretical  models,  we  conduct  extensive  simulation  studies.  Sim¬ 
ulation  results  illustrate  the  properties  of  the  games  and  the  derived  equilibria.  The  results  validate  our 
design  philosophy  and  clearly  indicate  that  the  proposed  game  theoretic  solutions  can  be  effectively  used 
to  enforce  cooperation  and  mitigate  attacks. 
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1  Introduction 


In  a  distributed  wireless  system  where  multiple  network  entities  (also  called  nodes)  work  towards  individual 
or  common  goals,  cooperative  behavior  among  the  nodes  (such  as  controlling  the  transmit  power  level, 
reducing  interference  for  each  other,  revealing  private  information,  adhering  to  network  policies)  is  highly 
desired  for  increasing  system  capacity.  Though  this  desirable  property  makes  it  easy  to  analyze  a  system 
due  to  state  space  reduction;  in  reality,  this  assumption  might  be  too  strong.  For  example,  there  might  be 
nodes  in  the  network  which  might  act  in  a  selfish  and/or  malicious  manner.  These  nodes,  which  are  also 
known  as  misbehaving  nodes,  bring  more  challenges  to  the  design  of  the  wireless  network. 

1.1  Misbehavior  in  Wireless  Networks 

The  research  of  mitigating  misbehavior  in  wireless  network  is  motivated  from  the  following  example  where 
the  network  services  are  disrupted  by  nodes’  misbehavior.  We  consider  a  wireless  network  of  mobile  devices 
(nodes)  operating  in  an  area  without  any  network  infrastructure  support,  i.e.,  in  ad  hoc  mode.  When 
transferring  packets,  the  nodes  rely  on  each  other  for  forwarding  packets.  Thus,  it  is  very  important  that 
all  the  nodes  in  the  network  act  in  a  cooperative  manner.  However,  due  to  limited  resources  (e.g.,  energy 
supply,  computing  power)  that  each  node  has,  the  notion  of  cooperation  might  not  be  rational.  As  a  result, 
nodes  may  prefer  not  to  participate  in  packet  forwarding  or  even  worse,  they  can,  individually  or  in  groups, 
resist  cooperating  with  the  rest  of  the  network.  Hence,  noncooperation  is  a  type  of  misbehavior;  the  network 
might  become  disconnected  when  some  nodes  do  not  forward  packets  cooperatively.  Therefore,  an  important 
question  to  ask  is  how  to  stimulate  or  enforce  cooperation  among  the  nodes  to  ensure  the  proper  functioning 
of  the  network. 

There  are  two  aspects  to  this  question:  i)  if  the  nodes  are  rational  and  self-interested,  and  only  care  about 
maximizing  their  own  benefit,  how  do  we  design  incentives  so  that  the  nodes  are  willing  to  be  cooperative 
and  ii)  if  the  nodes  do  not  want  to  cooperate,  how  do  we  punish  them  so  that  the  punishment  enforces 
them  to  cooperate  in  the  future?  To  address  these  issues,  we  need  to  analyze  what  strategies  nodes  adopt 
and  what  actions  they  take  when  they  get  the  packets  to  be  forwarded.  Moreover,  in  wireless  network,  the 
wireless  channel  is  highly  unreliable  due  to  noise,  multipath  path  fading,  interference,  etc.  A  node’s  action 
in  packet  forwarding  may  not  be  accurately  observed  by  others,  or  in  other  words,  its  action  is  hidden  from 
the  rest  of  the  network.  Thus,  the  question  becomes  even  more  complicated  when  the  unreliable  channel  is 
considered. 

This  example  can  be  further  challenging  when  some  malicious  nodes  exist.  Unlike  the  uncooperative 
nodes,  the  malicious  nodes  launch  attacks  in  the  network.  In  the  context  of  packet  forwarding,  the  attacks 
can  be;  intentionally  dropping  packets,  altering  the  contents  maliciously,  and  etc.  The  objective  of  such 
malicious  nodes  is  to  cause  harm  and  bring  disorder  to  the  network;  their  goal  is  to  maximize  the  damage 
before  they  are  detected  and  isolated. 

In  order  to  minimize  the  impact  of  the  malicious  nodes,  detection  mechanisms  need  to  be  in  place.  Thus,  a 
regular  node  should  monitor  its  surroundings  and  distinguish  a  malicious  node  from  a  regular  one.  However, 
the  detection  process  has  challenges.  First,  monitoring  can  be  costly.  To  identify  the  malice,  a  regular  node 
has  to  listen  to  the  channel  and/or  process  the  information  sent  by  the  nodes  being  monitored.  Listening  and 
processing  consume  resources  and  hence,  an  “always  on”  monitoring  scheme  is  not  efficient  even  if  plausible. 
Second,  the  malicious  node  can  disguise  itself.  To  reduce  the  probability  of  being  detected,  a  malicious 


2 


can  behave  like  a  regular  node  and  choose  longer  intervals  between  attacks.  Third,  the  randomness  and 
unreliability  of  the  wireless  channel  bring  more  uncertainty  to  the  monitoring  and  detection  process. 

In  spite  of  the  above  challenges,  mechanisms  to  detect  malicious  nodes  can  always  be  designed.  However, 
the  important  question  is  ‘what  should  the  regular  node  do  upon  detecting  a  malicious  node?’  Though  the 
reasonable  response  would  be  to  immediately  isolate  the  malicious  node,  there  might  be  situations  where 
malicious  nodes  can  be  kept  and  made  use  of.  The  most  straightforward  reason  for  the  coexistence  is  that  a 
malicious  node  has  no  idea  whether  it  has  been  identified  or  not,  and  it  will  continue  to  operate  like  a  regular 
node  to  avoid  detection.  During  this  time,  i.e.,  when  the  malicious  node  cooperates  in  disguise,  it  can  be 
exploited  for  normal  network  operations.  This  “involuntary”  help  from  the  malicious  node  may  be  valuable, 
especially  when  the  network  resource  is  limited.  As  a  matter  of  fact,  from  the  perspective  of  the  malicious 
nodes,  coexistence  gives  them  a  longer  lifetime  in  the  network  and  the  opportunity  to  launch  future  attacks. 
As  far  as  the  regular  nodes  are  concerned,  they  have  a  criteria  to  evaluate  the  benefit  from  the  malicious 
nodes.  The  criteria  also  determine  when  to  terminate  the  coexistence  and  isolate  the  malicious  nodes. 

To  make  the  process  of  detection  even  more  difficult,  the  malicious  nodes  do  not  act  passively  and  wait 
to  be  detected.  Instead,  they  also  study  the  interaction  they  have  with  the  rest  of  the  network  and  adjust 
their  subsequent  actions  accordingly.  It  is  also  possible  that  a  malicious  node  is  wise  enough  to  learn  and 
predict  the  actions  of  the  regular  nodes  to  assist  itself  in  decision  making.  The  options  available  to  the 
malicious  nodes  complicate  the  solution  space  and  most  traditional  control  theoretic  approaches  fail  to  find 
the  equilibrium  strategies  for  both  the  regular  and  malicious  nodes.  In  particular,  these  problems  fall  more 
appropriately  in  the  domain  of  static  and  dynamic  distributed  games  and  thus  the  application  of  game 
theory  is  an  elegant  way  to  tackle  such  problems.  It  is  important  that  solution  concepts  from  game  theory 
are  used  to  guide  the  protocol  design  process  such  that  nodes  working  in  a  distributed  manner  can  co-exist, 
even  with  different  intents. 

1.2  Game  Theory  and  Wireless  Network  Design 

As  far  as  a  node  in  a  wireless  network  is  concerned,  its  action  is  accompanied  with  a  cost  or  utility,  e.g.,  nodes 
consume  energy  in  monitoring  or  forwarding  others’  data.  It  is  very  reasonable  and  intuitive  that  the  nodes 
are  rational  (self  interested)  and  the  actions  of  a  node,  in  response  to  others’,  are  aimed  at  maximizing  its 
utility.  In  addition,  the  nodes  adapt  their  behavior  by  learning  their  utility  for  each  potential  action  through 
feedback,  which  is  defined  by  the  overall  objective  function  of  the  network.  In  this  way,  nodes  dynamically 
react  to  changing  network  conditions,  energy  budgets,  and  external  stimuli. 

Concepts  from  game  theory  [16,  38]  make  perfect  sense  when  dealing  with  nodes  that  are  interested  in 
net  earnings  {utility,  payoffs)  for  the  tasks  they  perform  while  interacting  with  others  -  be  it  cooperatively, 
maliciously  or  otherwise.  In  such  a  distributive  environment  where  nodes  make  their  own  decisions,  the 
utility  obtained  by  a  node  not  only  depends  on  what  it  does  but  also  on  what  others  do.  Given  a  set  of 
rational  nodes  in  a  wireless  network,  the  decision  whether  to  cooperate  in  the  process  of  packet  forwarding 
or  how  to  respond  to  others’  actions  can  be  best  analyzed  using  non-cooperative  game  theory.  Moreover,  the 
underlying  unreliable  channel  makes  the  nodes’  actions  hidden  from  each  other.  This  essentially  translates 
to  an  imperfect  information  game.  When  malicious  attacks  exist,  the  regular  nodes  need  to  detect  the 
malicious  ones  through  a  series  of  interactions,  which  can  be  captured  by  Bayesian  games.  Often  times,  the 
same  game  is  played  for  a  number  of  repetitions  (as  in  the  case  with  packet  forwarding),  and  the  observed 
history  may  influence  the  nodes’  future  actions.  Furthermore,  when  the  nodes  can  learn  from  the  results  of 
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the  previously  played  games,  as  the  game  repeats,  the  population  of  the  players  taking  the  same  strategy 
evolves  and  it  is  interesting  to  find  a  prevailing  strategy. 

An  important  notion  in  game  theory  is  the  equilibrium.  It  characterizes  a  steady  state  that  all  the 
players  are  satisfied  with  the  payoffs  and  willing  to  adhere  to  current  strategies.  Given  different  modeling 
of  the  games,  equilibrium  can  be  characterized  with  various  properties,  e.g.,  Nash  Equilibrium  and  its 
rehnement  Sequential  Equilibrium.  To  mitigate  the  misbehavior  in  wireless  network,  we  are  interested  in 
obtaining  certain  equilibria  such  that  adhering  to  the  equilibrium  strategies  leads  to  the  suppression  of 
the  misbehavior.  However,  the  desired  equilibria  are  unique  when  we  take  different  network  settings  into 
account.  For  example,  the  equilibrium  strategy  under  homogeneous  lossy  channel  differs  from  that  under 
heterogeneous  lossy  channel.  While  the  dehnitions  of  various  games  and  equilibria  are  defined  in  the  next 
section,  we  will  give  more  exemplified  explanation  when  we  introduce  the  modeling  of  our  games. 

1.3  Major  Findings  of  this  Project 

Our  main  focus  in  this  project  is  to  provide  a  game  theoretic  analysis  on  how  to  mitigate  the  aforemen¬ 
tioned  misbehavior  in  the  presence  of  unreliable  channels.  The  interaction  among  the  nodes  in  the  wireless 
network  is  presented  by  the  packet  forwarding  process  which  is  further  abstracted  as  a  two  player  game. 
The  misbehavior  of  the  nodes  can  be  categorized  as  two  types,  namely  uncooperative  and  malicious.  For 
the  former  category,  we  attempt  to  obtain  a  condition,  under  which  cooperation  can  be  enforced.  In  the 
latter  category,  the  focus  is  the  detection  of  the  malicious  node  and  the  strategies  after  the  detection.  In 
particular,  this  research  includes  the  following  three  aspects:  i)  cooperation  enforcement  with  homogeneous 
unreliable  channel,  ii)  cooperation  enforcement  with  heterogeneous  unreliable  channel,  and  iii)  detection  and 
co-existence  with  malicious  nodes. 

1.3.1  Cooperation  Enforcement  with  Homogeneous  Unreliable  Channel 

We  begin  with  the  credit  exchange  system,  in  which  both  packet  purse  model  and  packet  trade  model  [10] 
are  investigated.  We  show  that  in  the  former  model,  dominant  strategy  exists  while  the  latter  one  fails  to 
give  enough  incentives  to  foster  cooperation.  Given  certain  incentives,  we  solve  the  forwarding  game  under 
the  general  unreliable  channel  and  derive  the  probability  of  packet  forwarding  that  leads  to  an  equilibrium. 
Furthermore,  we  extend  our  analysis  to  repeated  games;  we  take  several  different  strategy  prohles  and  find 
the  conditions  under  which  cooperation  leads  to  subgame  perfect  Nash  Equilibrium.  Results  show  that 
proper  incentives  leading  to  cooperation  are  related  to  the  belief  of  the  nodes’  continuous  participation  in 
the  game.  Besides  incentives,  a  reputation  based  strategy  which  generates  actions  based  on  the  observation 
of  opponents’  history  is  also  analyzed.  It  is  found  that  achieving  cooperation  with  such  history  dependent 
strategy  does  not  require  any  incentives,  even  when  the  observation  is  not  accurate  due  to  the  unreliable 
channel.  To  address  the  prevalence  of  cooperation,  we  focus  on  the  resistance  on  collusion  using  both 
repeated  games  and  evolutionary  games.  Our  hndings  indicate  a  subgame  perfect  cooperation  enforcement 
strategy  ensures  cooperation  as  a  prevailing  action  if  it  is  evolutionary  stable  or  the  initial  non-cooperative 
population  is  bounded. 

The  main  contributions  in  this  part  can  be  itemized  as  follows. 

•  We  evaluate  the  credit  exchange  system  under  the  unreliable  channel  and  analyze  on  several  well-known 
strategy  profiles  in  which  incentives  lead  to  subgame  perfect  Nash  Equilibrium.  Our  approaches  are 
general  in  nature  and  can  be  applied  to  different  strategy  profiles  and/or  payoff  matrices. 
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•  We  present  a  rigorous  proof  on  the  subgame  perfect  of  a  reputation  based  strategy  (i.e.,  CORE  [34,  35]). 
We  show  that  such  history  based  strategy  does  not  require  any  incentives,  and  full  cooperation  is  the 
subgame  perfect  equilibrium  regardless  of  the  channel. 

•  We  adopt  evolutionary  game  theory  in  capturing  the  population  dynamics.  Analysis  indicates  that  if 
nodes  are  patient  enough  in  the  game  and  value  future  payoffs,  collusion  resistance  and  cooperation 
enforcement  are  equivalent. 

•  We  study  the  convergence  of  the  cooperation  enforcement  through  simulation.  In  particular,  we  show 
cooperation  can  be  enforced  within  the  entire  population  when  cooperative  and  collusion  strategies 
coexist.  The  rate  the  population  convergence  is  affected  by  the  initial  population  share,  channel 
unreliability,  and  the  payoff  matrix. 

1.3.2  Cooperation  Enforcement  with  Heterogeneous  Unreliable  Channel 

The  problems  solved  in  the  previous  section  require  a  revision  when  the  heterogeneous  unreliable  channel  is 
assumed.  The  reason  is  that  under  heterogeneous  unreliable  channel,  every  node  in  the  network  has  different 
and  private  observations.  This  part  of  the  work  is  motivated  by  the  state-of-the-art  advances  in  game  theory 
on  repeated  games  under  private  monitoring  and  strategies  [7,  20,  21].  We  re-model  the  packet  forwarding 
game  considering  noise  and  show  that  although  nodes’  actions  are  hidden  due  to  the  channel,  they  can 
nevertheless  monitor  their  own  payoffs.  Based  on  the  private  observation  of  their  payoffs,  we  construct  a 
forwarding  approach  using  a  two-state  machine.  We  demonstrate  that,  through  carefully  designing  the  state 
transition  parameters,  sequential  equilibria  can  be  achieved  to  enforce  cooperation.  Eurthermore,  we  extend 
our  results  to  a  multi-hop  wireless  network  and  propose  a  multi-hop  packet  forwarding  strategy  to  attain 
the  same  equilibria.  Simulation  results  reveal  that  the  network  is  able  to  achieve  better  throughput  when 
the  proposed  strategies  are  adopted. 

The  highlights  in  this  part  are  as  follows. 

•  We  model  the  packet  forwarding  process  with  channel  noise  as  a  hidden  action  game  with  imperfect 
monitoring  and  propose  a  strategy  profile  to  the  game.  The  strategy  is  shown  to  give  a  sequential 
equilibrium  solution.  Extensive  simulations  show  that  the  cooperation  enforcement  strategy  is  more 
efficient  (Pareto  superior)  over  non-cooperative  ones. 

•  A  multi-hop  packet  forwarding  strategy  based  on  the  state  machine  approach  is  provided  in  a  general 
multi-hop  wireless  network.  Our  simulation  results  indicate  that  the  performance  in  terms  of  network 
throughput  is  very  close  to  a  fully  cooperative  network. 

1.3.3  Detection  and  Co-existence  with  Malicious  Nodes 

When  malicious  nodes  exist  in  the  network,  the  modeling  and  analysis  are  focused  on  the  interactions  between 
a  malicious  node  and  a  regular  node.  In  particular,  we  formalize  the  interactions  into  two  cascaded  games. 
The  first  game,  namely  malicious  node  detection  game,  is  a  Bayesian  game  with  imperfect  information.  The 
information  is  hidden  because  the  malicious  node  can  disguise  as  a  regular  node  and  the  actions  are  hidden 
due  to  the  noise  and  imperfect  observation.  The  second  game,  called  post- detection  game,  is  played  when 
the  regular  node  knows  confidently  that  its  opponent  is  a  malicious  node.  In  the  latter  game,  the  regular 
node  observes  and  evaluates  the  actions  of  the  malicious  node,  and  decides  whether  to  keep  it  or  isolate 
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it.  For  both  games,  we  show  the  existence  of  equilibria  and  derive  the  conditions  that  achieve  them.  To 
address  the  possible  countermeasures  the  malicious  node  might  take,  we  propose  a  nested  belief  model.  In 
this  model,  the  malicious  node  learns  from  its  private  observations  and  predicts  if  the  regular  node  has 
accumulated  enough  information  to  make  the  detection.  Associated  with  the  belief,  we  show  that  a  Markov 
Perfect  Bayes-Nash  Equilibrium  emerges.  We  also  provide  simulation  study  to  support  the  efficiency  and 
other  properties  of  the  equilibria. 

The  main  findings  can  be  categorized  into  three  aspects. 

•  We  model  the  malicious  node  detection  game  under  unreliable  channels  as  a  Bayesian  game  with 
imperfect  monitoring  and  show  a  mixed  strategy  perfect  Bayesian  Nash  Equilibrium  is  attainable. 
The  strategy  profile  is  also  shown  to  give  a  sequential  equilibrium  solution.  Results  show  how  the 
equilibrium  strategy  profiles  are  affected  by  parameters  like  channel  noise,  successful  attack  rate, 
successful  detection  rate,  attack  gain,  detection  gain,  and  false  alarm  rate. 

•  We  propose  the  notion  of  coexistence  after  detection  in  order  to  utilize  the  malicious  node.  A  coex¬ 
istence  index  is  designed  to  evaluate  the  helpfulness  of  a  malicious  node.  We  derive  the  conditions 
under  which  a  subgame  perfect  Nash  Equilibrium  is  achieved.  Through  simulation,  we  also  show  how 
the  malicious  node  can  be  used  to  improve  the  network  throughput  and  extend  network  lifetime. 

•  We  introduce  a  novel  belief  about  belief  model  employed  by  the  malicious  node.  A  Markov  Perfect 
Bayes-Nash  Equilibrium  is  induced  when  both  nodes  constantly  update  their  beliefs.  This  equilibrium 
is  shown  to  delay  the  detection  of  the  malicious  node  and  help  the  malicious  node  actively  adjust  its 
strategy  to  avoid  detection.  This  model  also  helps  to  integrate  the  detection  and  post-detection  games 
with  effective  transition. 

1.4  Related  Work 

Game  theory  [16,  38]  has  been  successfully  applied  to  solve  various  problems  in  wireless  networks  including 
cooperation  enforcement  [12,  13,  14,  18,  35,  36,  41],  routing  protocols  [17,  37,  45,  52]  and  other  system  design 
issues  [4,  26,  27,  30,  31,  50]. 

As  far  as  cooperation  enforcement  in  wireless  networks  are  concerned,  mechanisms  have  been  devised 
that  either  stimulate  nodes  to  forward  each  others’  packets  [8,  10,  12]  or  punish  nodes  for  misbehaving 
[5,  9,  32,  33,  39].  Usually,  selfish  nodes  need  to  be  identified  and  isolated  by  mechanisms  like  Watchdog  [32] 
or  Pathrater  [33]. 

Majority  of  the  proposed  methods  can  be  generally  categorized  into  two  types:  incentive  based  [11,  13, 
41,  51]  and  reputation  based  [17,  34,  35].  Most  incentive  based  protocols  assume  the  network  with  rational 
nodes/agents  and  adopt  the  concept  of  virtual  currency  (e.g.  “nuglets”)  [10]  which  is  a  method  to  reward 
nodes  participating  in  forwarding  packets.  It  has  been  well  established  that  pricing  schemes  (in  terms  of 
reward  and  penalty)  [3,  13],  and  the  security  of  payment  system  [11,  12,  51]  are  closely  associated  with  the 
incentive  based  approaches.  On  the  contrary,  in  a  reputation  based  system,  a  node’s  behavior  is  monitored 
and  measured  by  its  neighbors.  Based  on  the  observed  past  behaviors,  a  node  receives  certain  level  of  services 
or  gets  isolated  for  being  non-cooperative  [9,  32,  33].  An  example  of  reputation  based  scheme  is  CORE  [34], 
where  each  node  maintains  a  reputation  table  for  the  other  nodes.  The  reputation  value  is  updated  based 
on  the  node’s  own  observations  and  the  information  provided  by  the  other  nodes. 
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Meanwhile,  there  have  been  some  interesting  developments  that  use  game  theory  to  analyze  how  cooper¬ 
ation  can  be  achieved  [14,  35,  41,  52],  In  [14],  Felegyhazi  et  al.  formally  define  the  packet  forwarding  game  in 
ad  hoc  networks  and  derive  the  conditions  under  which  cooperation  yields  Nash  Equilibrium.  Michiardi  et  al. 
apply  game  theory  in  [35]  to  analyze  several  strategies  in  the  repeated  prisoner’s  dilemma.  They  also  show 
that  in  order  to  foster  the  coalition  among  cooperative  nodes,  enough  incentives  should  be  granted.  Zhong 
et  al.  [52]  show  that  there  is  no  dominant  strategy  solution  in  a  forwarding  subgame  and  cryptographic 
techniques  can  be  employed  for  the  required  tamper-proof  hardware  support.  A  more  general  framework  on 
cooperation  in  ad  hoc  networks  is  presented  in  [41],  where  Srinivasan  et  al.  focus  on  the  energy  efficiency 
through  cooperation. 

However,  the  aforementioned  efforts  are  not  sufficient  to  completely  understand  and  model  cooperation 
in  wireless  network  in  the  presence  of  noise.  The  noisy  nature  of  the  wireless  channels  makes  the  analysis 
very  challenging.  More  recent  work  [17,  18,  37,  47,  49]  have  called  attention  to  the  effect  of  noise  that 
makes  the  observation  imperfect.  In  [17],  Jaramillo  et  al.  propose  a  distributed  reputation  monitoring  based 
strategy  to  enforce  cooperation  when  the  channel  is  not  loss  free.  Their  strategy  is  proved  to  be  subgame 
perfect  even  if  the  channel  estimation  is  not  accurate.  Studies  on  several  cooperation  enforcement  schemes 
when  channel  collision  exists  are  presented  in  [37,  47],  where  non-cooperative  game  theory  is  used.  In  [49], 
statistical  methods  are  used  to  filter  noise  from  observation  so  that  attacks  can  be  identified.  Ji  et  al. 
[18]  calculate  the  belief  of  nodes  on  others’  actions  and  propose  a  belief-based  multi-node  multi-hop  packet 
forwarding  scheme.  Li  et  al.  [26]  further  generalize  noise  and  imperfect  monitoring  as  hidden  information 
and  hidden  action  games,  and  study  truthful  routing  issues  from  a  mechanism  design  perspective.  Related 
investigations  are  also  shown  by  Feldman  et  al.  in  [15]. 

In  spite  of  these  developments,  cooperation  enforcement  in  wireless  networks  with  noisy  channel  has 
not  been  generalized.  As  a  matter  of  fact,  although  our  work  is  inspired  by  [17,  18],  our  modeling  and 
methodology  is  quite  different  from  existing  work  and  should  not  be  considered  as  a  simple  variant.  In 
[17],  the  implicit  assumption  is  that  the  channels  and  environment  are  identical  around  the  receiver  and  the 
observer;  however,  in  our  model,  the  channels  are  assumed  to  be  heterogeneous.  Sometimes,  the  theories  and 
calculations  are  too  complex  [18].  Furthermore,  the  difficulties  in  hidden  action  game  with  imperfect  private 
monitoring  games  are  generally  two-fold.  First,  when  the  noisy  channel  makes  action  history  unknown  to 
the  public,  the  games  do  not  possess  the  recursive  structure  on  the  equilibrium  [1].  Second,  players  (nodes) 
are  not  sure  about  what  the  opponents  are  going  to  do  because  they  cannot  perfectly  monitor  their  actions. 
In  this  case,  a  player  must  take  the  best  strategy  based  on  her  belief  about  her  opponents’  actions  at  every 
move,  which  is  the  essence  of  the  strategies  proposed  in  [18].  However,  the  drawback  of  the  belief  updating 
strategies  is  that  the  calculations  on  updating  the  beliefs  are  usually  extremely  complex.  Nonetheless,  in 
this  project,  our  proposed  methods  overcome  such  shortcomings. 

Recently,  much  work  has  been  done  that  investigates  the  interactions  between  the  regular  and  malicious 
nodes  using  game  theory.  Kodialam  et.  al.  formally  propose  a  game  theoretic  framework  to  model  how 
a  service  provider  detects  an  intruder  [23].  However,  their  assumptions  of  zero-sum  game  and  complete, 
perfect  knowledge  have  limitations.  Agah  et.  al.  study  the  non- zero-sum  intrusion  detection  game  in  [2]; 
their  results  infer  the  optimal  strategies  in  one-stage  static  game  with  complete  information.  In  [29],  Liu 
et.  al.  propose  a  Bayesian  hybrid  detection  approach  to  detect  intrusion  in  wireless  ad  hoc  networks.  They 
design  an  energy  efficient  detection  procedure  while  improving  the  overall  detection  power.  The  intrusion 
detection  game  with  networked  devices  are  investigated  in  [54],  where  Zhu  et.  al.  introduce  an  N-person  non- 
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cooperative  game  to  study  incentive  compatibility  of  the  collaborative  detection.  [28]  models  the  intention 
and  strategies  of  a  malicious  attacker  through  an  incentive-based  approach.  The  importance  of  the  topology 
on  the  payoffs  of  the  malicious  nodes  are  investigated  in  [42].  An  interesting  flee  option  for  the  malicious 
node  is  proposed  in  [25].  In  that  analysis,  a  malicious  decides  to  flee  when  it  believes  it  is  too  risky  to  stay 
in  the  network.  While  the  approach  focuses  on  how  the  flee  action  affects  the  result  of  the  game,  it  does  not 
consider  the  noise  in  observation. 

2  Game  Theoretic  Definitions  and  Preliminaries 

Game  theory  offers  power  tools  in  modeling  and  analyzing  conflicts  and  cooperation  among  multiple  players 
in  a  system  with  regard  to  strategic  decision  making.  As  a  branch  of  applied  mathematics,  it  is  widely 
applied  to  areas  of  politics,  biology,  engineering  and  more.  In  Section  1.4,  we  have  shown  some  applications 
of  game  theory  in  wireless  communication  and  networking.  In  this  section,  we  formally  review  some  of  the 
fundamental  definitions  and  concepts  in  game  theory  [16,  38,  43]  that  will  be  used  and  applied  throughout 
this  report. 

2.1  Game,  strategy,  and  equilibrium 

A  game  consists  of  players,  the  possible  actions  of  the  players,  and  consequences  of  the  actions.  Formally, 
the  word  “game”  is  defined  as: 

Definition  1.  A  game  T  is  a  triple  (I,S,u),  where 

•  I  =  l,2,...,n  denotes  a  set  of  players. 

•  S  =  Xi^iSi  is  a  space  of  strategy  profiles.  It  is  the  Cartesian  product  of  strategy  profile  Si  for  each  of 
the  player  i. 

•  u  is  a  vector  of  von  Neumann- Morgenstern  utility  functions  defined  over  S.  For  a  particular  strategy 
profile  s,  u(s)  =  {ui{s),U2{s),  ...Ui{s))  is  called  a  payoff  vector  consists  of  individual  payoffs  Ui{s) . 

The  most  fundamental  assumption  in  game  theory  is  that  all  players  in  the  game  are  rational.  A  rational 
player  chooses  actions  to  maximize  her  payoffs.  In  the  case  the  game  is  not  deterministic,  the  player  chooses 
to  maximize  her  expected  utility  (payoffs).  The  idea  of  maximizing  the  expected  payoff  was  justified  by  the 
seminal  work  of  von  Neumann  and  Morgenstern  in  1944  [44];  it  is  characterize  the  probabilistic  distribution 
of  the  payoffs.  As  the  expected  utility  is  governed  by  the  utility  functions,  a  game  essentially  describes  the 
actions  the  players  can  take,  which  are  mapped  to  the  consequences  (i.e.,  payoffs)  of  the  actions  by  the 
utility  functions. 

Since  the  game  is  an  interaction  among  players,  the  payoff  of  a  player  (denoted  as  i)  may  be  determined 
not  only  by  her  actions,  but  also  the  actions  of  other  players’.  In  this  regards,  often  times,  player  i  is 
interested  in  what  strategies  the  rest  of  the  players  in  the  game  take.  We  denote  the  deleted  strategy  profile 
S-i  =  (si, ...,  Si-1,  Si+i, ...,  Sn).  However,  it  might  happen  that  in  a  game,  a  player’s  action  is  superior  to 
others’,  no  matter  what  the  other  players  do. 

Definition  2.  A  strategy  s'  is  a  dominant  strategy  for  player  i  if  ufisi,  S-i)  >  ufisi,  S-fi,  for  every  S-i 
of  the  other  players’  actions.  Similarly,  a  dominated  strategy  s"is  ufis'f,  S-f)  <  ufisi,  s_j).  The  dominance 
is  strict  if  the  inequality  holds. 


A  game  can  be  solved  by  iterated  elimination  of  strictly  dominated  strategies.  However,  not  every 
strategy  profile  has  strategic  dominance.  Even  if  there  is  no  dominant  strategies,  a  player  also  has  her  belief 
on  what  strategies  other  players  will  take,  so  that  she  can  pick  her  response  strategies  accordingly. 

Definition  3.  A  strategy  s'  e  Si  is  a  best  response  strategy  for  player  i  if  and  only  if  Ui{s[,  s^i)  > 

l)?  V  S2  G  Si\Sj^. 

From  the  definition  above,  we  can  see  that  if  the  strategies  taken  by  players  are  mutual  best  to  each 
other,  no  player  would  like  to  deviate  from  the  given  strategy  profile.  The  unwillingness  of  deviation  is  an 
outcome  of  the  game,  and  it  describes  a  steady  condition  that  all  players  reach  while  playing  with  each 
other.  To  identify  the  strategy  profiles  that  lead  to  the  steady  condition,  the  concept  of  Nash  Equilibrium 
is  introduced  and  defined  as  follows: 

Definition  4.  The  strategy  profile  s*  is  a  Nash  Equilibrium,  if 

Ui{s*)  >  Ui{si,s*_fj  for  every  strategy  Si  of  player  i. 

What  Nash  Equilibrium  suggests  is  a  state,  that  none  of  the  players  would  unilaterally  change  the 
strategy  to  increase  the  utility.  Thus  Nash  Equilibrium  brings  the  game  to  a  steady  state,  from  which  the 
players  would  not  like  to  deviate  as  that  would  not  increase  their  benefits  any  more. 

A  classical  representation  of  the  Nash  Equilibrium  is  illustrated  through  the  famous  example  of  Prisoner’s 
Dilemma.  The  matrix  in  Figure  1  shows  the  payoffs  (number  of  years  the  prisoner  will  be  sentenced)  when  an 
strategy  profile  is  adopted,  however,  in  this  example,  no  communication  between  the  prisoners  is  assumed. 


Prisoner  B  stays  Silent 

Prisoner  B  Defects 

Prisoner  A  stays  Silent 

(1  year,  1  year) 

(10  years,  0  year) 

Prisoner  A  Defects 

(0  year,  10  years) 

(3  years,  3  years) 

Table  1:  Payoff  matrix  of  Prisoner’s  Dilemma. 


In  Prisoner’s  Dilemma,  “Defect”  is  the  dominant  strategy  for  Prisoner  A/B,  because  no  matter  what 
strategy  his  opponent  takes,  the  payoff  is  strictly  better  (0  year  is  better  than  1  year,  3  years  are  better  than 
10  years).  Therefore,  (Defect,  Defect)  is  the  Nash  Equilibrium  obtained  by  iterated  elimination. 

Another  classical  Nash  Equilibrium  example  is  called  the  Battle  of  the  sexes  [38],  as  the  payoff  matrix 
which  is  shown  in  Figure  2.  In  this  two-player  coordination  game,  a  couple  prefers  to  go  an  evening  event 
together  but  has  not  made  a  decision  on  which  event  they  will  attend,  a  Football  match  or  Opera.  For 
the  husband,  if  the  wife  chooses  to  go  to  Opera,  his  best  response  strategy  is  to  choose  Opera.  When  the 
husband  chooses  Opera,  the  best  response  for  wife  is  also  Opera.  Thus,  (Opera,  Opera)  is  a  mutual  best 
response  strategy  and  forms  a  Nash  Equilibrium.  However,  (Football,  Football)  is  also  a  Nash  Equilibrium 
point  based  on  the  “mutually  best”  criteria. 

The  above  examples  show  the  game  with  pure  strategies.  A  pure  strategy  provides  a  complete  definition 
of  how  a  player  will  play  a  game.  Or  in  other  words,  a  pure  strategy  gives  deterministic  moves  of  a  player 
for  every  situation  she  could  face.  Apart  from  pure  strategies,  a  player  can  also  play  mixed  strategies.  A 
mixed  strategy  is  an  assignment  of  a  probability  to  each  pure  strategy  available  to  the  player.  The  mixed 
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Wife  chooses  Opera 

Wife  chooses  Football 

Husband  chooses  Opera 

(1,4) 

(0,  0) 

Husband  chooses  Football 

(0,  0) 

(4,  1) 

Table  2:  Payoff  matrix  of  the  Battle  of  the  sexes. 


strategy  allows  the  player  to  choose  randomly  a  pure  strategy,  and  thus  create  infinite  numbers  of  mixed 
strategies  from  the  strategy  set.  The  Nash  Equilibrium  obtained  with  pure  strategies  is  called  Pure  Strategy 
Nash  Equilibrium,  as  opposed  to  the  Mixed  Strategy  Nash  Equilibrium  obtained  with  mixed  strategies. 

2.2  Classes  of  games 

When  game  theory  is  applied  to  model  the  interactions  among  players,  based  on  how  the  games  are  played 
and  what  the  players  know,  the  game  can  be  categorized  into  different  classes.  In  this  section,  we  introduce 
several  classes  of  games  that  are  used  in  our  research. 

•  Repeated  game.  In  most  cases,  games  are  played  continuously  rather  than  one  shot.  The  theory  of 
repeated  game  is  to  capture  how  a  player’s  actions  early  on  can  affect  what  others  choose  to  do  later 
on. 

Definition  5.  A  repeated  game  is  an  extensive  form  game  in  which  stage  game  T  is  played  finite  or  infinite 
number  of  times.  For  each  player,  the  set  of  actions  available  in  any  period  in  T  is  the  same,  regardless  of 
the  time  or  past  actions.  The  payoffs  to  the  players  depend  only  on  the  action  profiles  for  T  in  that  particular 
period,  and  is  independent  of  the  time. 

In  a  repeated  game,  we  consider  the  rth  stage.  Since  all  the  players  know  the  history  /i^  (perfectly  or 
imperfectly),  we  can  view  the  game  starting  at  rth  stage  as  a  subset  of  the  original  game,  and  call  it  a 
subgame. 

Similar  to  a  single  stage  game,  a  repeated  game  can  be  characterized  with  the  concept  of  Nash  Equilib¬ 
rium.  However,  it  is  refined  as  Subgame  perfect  equilibrium. 

Definition  6.  A  repeated  game  strategy  s  is  a  Subgame-Perfect  Nash  Equilibrium  (SPNE)  if  at  each 
subgame,  for  all  players  i 

Si  G  argmax  ufisi,  s_j). 

SiGSi 

If  h  is  the  history  generated  by  s,  then  h  is  the  associated  equilibrium  path. 

To  analyze  the  subgame  perfect  equilibria  in  repeated  games,  One-Shot  Deviation  Property  (OSDP)  is 
extensively  used.  A  strategy  profile  is  SPNE  if  it  satisfies  the  OSDP. 

Definition  7.  One-Shot  Deviation  Property:  no  player  can  increase  her  payoff  by  changing  her  action  at 
the  start  of  any  subgame  in  which  she  is  the  first-mover,  given  the  other  player’s  strategies  and  the  rest  of 
her  own  strategy. 
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•  Perfect  information  game.  A  game  is  called  perfect  information  game  if  all  players  know  all  moves 
that  have  taken  place.  Formally  speaking,  any  players  in  a  perfect  information  game  has  only  one 
element  in  her  information  set,  where  the  concept  of  information  set  is  defined  as: 

Definition  8.  Information  Set  is  a  set  that,  for  a  particular  player,  establishes  all  the  possible  moves  that 
could  have  taken  place  in  the  game  so  far,  given  what  that  player  has  observed. 

Perfect  information  cannot  always  be  guaranteed  due  to  the  opacity  of  the  opponents’  actions  or  the 
inaccuracy  of  the  observation.  In  the  case  of  imperfect  information,  a  player  can  only  conjecture  what  the 
other  players  have  played.  Hence,  probabilistic  analysis  might  be  employed  to  solve  the  game. 

•  Complete  information  game.  Complete  information  game  is  a  situation  in  which  knowledge  about 
every  players  is  available  to  others.  Every  player  knows  the  payoffs  and  strategies  available  to  other 
players. 

Similar  to  perfect  information,  complete  information  may  not  be  achieved  all  the  time.  In  an  incomplete 
information  game,  the  players  would  not  be  able  to  know  the  structure  or  the  utility  functions  of  the  game. 
As  a  consequence,  they  are  not  able  to  predict  the  effect  their  actions  would  have  on  the  other  players. 

It  is  worth  to  mention  that  complete  and  perfect  information  game  are  two  distinct  type  of  games. 
The  perfect  information  describe  the  actions  inside  the  game,  while  the  complete  information  states  the 
knowledge  about  the  game  structure  and  goals  of  the  players.  A  complete  information  game  can  be  with 
imperfect  information,  for  example,  in  Prisoners’  Dilemma,  although  both  prisoners  know  the  penalty  of 
silent  and  defect,  they  do  not  know  what  action  the  other  prisoner  takes.  Nonetheless,  the  incomplete 
information  game  can  be  transformed  into  a  imperfect  information  game  by  introducing  nature  as  a  third 
player.  However,  unlike  regular  players,  the  third  player  does  not  care  about  her  payoffs.  In  our  modeling 
of  the  games,  we  model  the  observation  inaccuracy  due  to  channel  unreliability  as  imperfect  information 
game,  while  we  treat  the  malicious  node  detection  process  as  incomplete  information  game.  Bayesian  game 
analysis  is  applied  to  transform  the  incomplete  information  game  to  imperfect  information  game. 

The  definitions  and  terminologies  introduced  in  this  section  serve  as  the  preliminaries  of  our  design  of 
misbehavior  mitigation  system.  In  the  rest  of  the  report,  we  will  carry  these  definitions.  We  will  also  revisit 
these  definitions  when  we  formally  define  our  games. 

3  Cooperation  Enforcement  with  Homogenous  Unreliable  Channel 

In  this  section,  we  use  game  theory  to  analyze  the  necessary  and  sufficient  conditions  to  enforce  cooperation, 
especially  when  a  node  cannot  perfectly  monitor  other  nodes’  behaviors  due  to  the  unreliability  of  the 
wireless  channel.  In  particular,  we  deal  with  homogeneous  unreliable  channel  in  this  section.  Homogeneous 
unreliable  channel  describes  the  wireless  media  with  the  same  loss  rates  for  every  transmitter  receiver  pair. 
In  contrast,  A  heterogeneous  unreliable  channel  indicates  the  channels  are  lossy,  but  with  different  loss/error 
probabilities. 

The  discussions  are  based  on  the  packet  forwarding  scenario  in  a  multi-hop  wireless  network.  In  Sec¬ 
tion  3.1,  we  analyze  a  credit  exchange  method  under  a  generalized  unreliable  channel  model  and  show  that 
the  packet  forwarding  probability  can  be  adjusted  through  proper  design  of  incentives,  which  in  turn  can 
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be  used  to  attain  the  desired  Nash  Equilibrium.  We  extend  our  discussion  to  repeated  games  in  Section  3.2 
where  we  take  several  well-known  strategy  profiles  and  derive  the  conditions  under  which  the  cooperation 
can  lead  to  a  subgame  perfect  Nash  equilibrium.  In  particular,  we  show  how  the  unreliable  channel  can 
affect  the  conditions  and  how  a  reputation  based  strategy  leads  to  subgame  perfection  even  under  imperfect 
monitoring.  In  Section  3.3,  we  define  the  anti-collusion  game  and  further  investigate  collusion  resistance  and 
cooperation  coalition  formation  using  evolutionary  game  theory.  We  prove  the  existence  of  an  upper  bound 
on  the  population  share  of  the  non-cooperative  nodes  for  an  evolutionarily  non-stable  strategy  that  enforces 
full  cooperation. 

3.1  Analysis  of  Credit  Exchange  System 

The  credit  exchange  system  is  constructed  on  the  notion  that  each  node  gains  certain  amount  of  credits 
after  participating  in  packet  forwarding.  These  credits,  for  example  “nuglets  [10]”,  are  transferable  and 
exchanged  from  one  node  to  another  as  a  payment  for  packet  forwarding.  Moreover,  the  credits  themselves 
are  issued  by  a  central  authority  and  without  forgery,  so  that  the  credit  can  be  used  as  “virtual  currency”. 
As  suggested  in  [10],  there  can  be  two  types  of  charging  models  for  credit  exchange:  Packet  Purse  Model 
(PPM)  and  Packet  Trade  Model  (PTM).  In  PPM,  the  packet  to  be  forwarded  is  initialized  with  a  certain 
amount  of  credit.  Each  forwarding  node  acquires  some  credits  and  forwards  the  packet  to  next  hop.  If  a 
packet  does  not  have  enough  credit,  it  will  be  discarded.  While,  in  PTM,  the  packet  does  not  carry  any 
credit,  but  is  traded  for  credits  among  intermediate  nodes.  Nodes  “buy”  the  packet  from  its  previous  hop 
with  some  credit  and  “sell”  the  packet  to  the  next  hop,  with  some  increased  amount  to  cover  the  forwarding 
cost.  A  packet  will  be  dropped  if  no  node  wants  to  buy  it.  Let  us  further  analyze  the  node  behavior  under 
these  models. 

3.1.1  Packet  Purse  Model 

We  consider  a  set  of  nodes  along  a  pre-defined  forwarding  path,  and  form  a  “forwarding  game” .  The  strategy 
profile  for  each  of  the  nodes  is  s  =  {Forward,  Discard).  We  define  the  cost  to  a  node  for  forwarding  a 
packet  is  1.  The  amount  of  credit  (a)  a  node  charges  for  forwarding  varies  according  to  its  instantaneous 
resource  availability,  for  example,  a  node  with  less  energy  may  charge  more  than  a  node  with  more  energy 
remaining.  It  is  obvious,  a  >  1  for  the  nodes  to  have  incentives  to  forward.  In  the  simplest  form  of  PPM,  a 
node  takes  the  credit  and  forwards  the  packet. 

Lemma  I.  In  PPM,  the  dominant  strategy  is  Discard. 

Proof:  For  node  i,  the  payoff  for  forwarding  is  a  —  1,  and  the  payoff  for  dropping  is  a.  Thus,  all  the  nodes 
will  simply  take  the  credit  and  drop  the  packet.  As  a  matter  of  fact,  since  the  first  node  on  the  route  will 
drop  the  packet  after  acquiring  the  credits,  the  packet  can  never  be  forwarded  beyond  one  hop.  □ 

It  is  clear  that  PPM  cannot  be  directly  applied  because  the  nodes  will  act  selfishly  and  will  have  no 
obligation  to  forward.  Hence,  secure  PPM  (SPPM)  was  proposed  that  addressed  this  issue  by  implementing 
some  credit  validation  mechanisms  [10].  In  SPPM,  the  credit  a  node  acquires  from  the  packet  is  not  valid 
unless  and  until  it  passes  the  packet  to  next  hop  and  receives  the  validation  from  the  next  hop  node.  We 
further  assume  that  the  transmission  of  validation  is  costless,  i.e.,  a  rational  node  will  pass  validation  to  its 
previous  hop  because  it  will  not  gain  any  benefit  not  to  do  so.  However,  due  to  channel  loss,  transmission 
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is  subject  to  be  unsuccessful  with  probability  pe-,  Pe  £  (0)1)-  This  loss  is  applicable  to  both  packet  and 
validation. 

Lemma  2.  In  SPPM,  the  dominant  strategy  is  Forward  if  pe  <  ,  Drop  if  Pe  >  2a\  ■ 

Proof:  Since  the  nodes  are  rational  and  they  do  not  obtain  any  beneht  by  dropping  the  validations,  they 
will  only  adopt  pure  strategies.  The  expected  payoff  of  forwarding  a  packet  is  (a  —  1)(1  —Pef'-,  if  both 
the  packet  transmission  and  validation  feedback  are  successfully  transmitted  over  the  unreliable  channel.  If 
a  node  chooses  to  drop  a  packet,  it  might  still  get  a  payoff  of  a(l  —  Pe)Pe,  should  the  next  hop  node  be 
“merciful”.  In  this  case,  the  next  hop  node  believes  that  the  packet  had  been  forwarded  but  lost  due  to  the 
channel,  and  nonetheless  passes  the  validation  back  and  is  successfully  received.  The  payoff  of  Forward  is 
strictly  greater  than  the  payoff  of  Drop  if  Pe  <  Else,  when  pe  >  ^zy,  the  payoff  of  Drop  is  greater, 

which  means  that  Forward  is  a  dominated  strategy  and  will  never  be  chosen.  However,  if  the  next  hop  node 
is  not  “merciful”,  the  payoff  for  Diseard  is  zero.  Thus  as  long  as  a  >  1,  Forward  is  the  dominant  strategy.  □ 
From  the  discussion  above,  we  can  see  that  the  validation  forces  the  nodes  to  forward  packets  in  order 
to  obtain  the  payoff.  However,  with  a  very  unreliable  channel,  a  node  may  still  be  willing  to  drop  because 
the  lossy  channel  makes  forwarding  non-prohtable.  Thus,  in  designing  the  incentives,  we  make  the  following 
claim. 

Corollary  1.  In  SPPM,  cooperation  is  enforced  if  a  >  yz^- 

3.1.2  Packet  Trade  Model 


Node  i 


Node  —i 


Buy 

Discard 


Buy  Discard 


ai  —  ai  —  a-i  a-i  —  a-i  —  ai 

—ai  —  a-i  a-i  —  a-i 

ai  —  ai  —a-i  —  ai 

Table  3:  Payoff  matrix  of  two-player  forwarding  game  under  PTM. 


For  the  packet  trade  model,  we  reconstruct  the  “forwarding  game”  as  a  reciprocal  two-player  game. 
Under  this  game  model,  any  two  neighboring  nodes  have  bidirectional  network  traffic  demands  [17],  i.e., 
they  rely  on  each  other  to  forward  packets.  This  assumption  can  be  regarded  as  an  abstract  scenario  when  a 
node  is  playing  the  same  game  versus  all  the  other  nodes  along  the  path.  We  model  the  two-player  forwarding 
game  as  both  nodes  forward  packets  to  each  other  at  the  same  time,  and  then,  they  simultaneously  decide 
whether  to  buy  the  packet  or  to  discard  it.  We  denote  two  neighboring  nodes  as  i  and  —i.  For  node  i,  the 
price  (cost)  it  pays  to  buy  the  packet  from  its  previous  node  is  dj,  and  the  price  at  which  it  sells  to  node 
—i  is  Oj.  These  prices  are  a-i  and  a-i  when  we  consider  the  node  —i.  The  payoff  matrix  for  the  nodes  is 
shown  in  Table  3. 

In  this  game,  since  nodes  will  obtain  beneht  by  selling  packets,  ax  >  cix  >  0;  x  =  i,  —i.  It  is  obvious  that 
the  strategy  Buy  is  strictly  dominated  by  Discard  for  both  nodes.  Hence,  if  we  assume  that  both  nodes  will 
only  adopt  pure  strategies,  the  game  has  the  same  form  as  the  well-known  Prisoner’s  Dilemma  game  [38], 
although  {Buy,  Buy)  is  more  desired.  Thus  we  have. 


Lemma  3.  In  PTM,  the  strategies  for  attaining  Nash  Equilibrium  is  (Discard,  Discard). 
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3.1.3  Hybrid  Model 


Neither  PPM  nor  PTM  provides  enough  insight  for  designing  proper  incentives  for  cooperation.  Thus,  we 
consider  a  hybrid  model  which  is  a  combination  of  PPM  and  PTM.  We  still  consider  the  “forwarding  game” . 
We  continue  to  assume  that  the  links  are  bidirectional,  i.e.,  both  nodes  forward  the  packets  to  each  other, 
and  then  a  decision  to  forward  or  discard  is  made.  As  shown  in  Figure  1,  node  Na  relies  on  node  Nb  to 
forward  packet  to  Nc,  and  Nb  relies  on  Na  to  forward  packet  to  Nb-  The  cost  of  forwarding  is  1,  and  nodes 
get  a  as  reward  if  their  packet  is  forwarded  by  their  counterpart.  However,  a  reward  cannot  be  granted 
unless  packets  in  both  directions  are  forwarded,  otherwise,  the  node  which  discards  gets  no  payoff  and  the 
other  node  consumes  1  for  forwarding.  The  payoff  matrix  is  shown  in  Table  4. 


< -  e 


Nd  Na  - >  Nb  - >  Nc 


Figure  1:  Packet  forwarding  game. 


Node  i 


Forward 

Discard 


Node  —i 


Forward  Discard 


a  —  1  a  —  1 

-  1  0 

0  -1 

0  0 

Table  4:  Payoff  matrix  of  two  player  forwarding  game. 


It  is  shown  in  [52]  that  there  is  no  such  strategy  profile  in  which  forwarding  is  always  the  dominant 
strategy.  This  also  infers  that  the  actions  or  strategies  nodes  take  are  hidden  from  each  other.  It  is  even 
more  obvious  when  we  consider  an  unreliable  channel.  When  a  node  drops  a  packet,  it  is  impossible  for 
others  to  distinguish  whether  the  dropping  was  intentional  or  due  to  the  unreliability  of  the  channel.  In  this 
type  of  game,  where  the  actions  of  the  nodes  are  hidden  from  each  other,  nodes  have  enough  reason  (even  if 
no  incentive)  to  play  mixed  strategies.  Or  rather,  they  are  perceived  to  be  playing  mixed  strategies  simply 
because  the  perception  is  not  accurate  through  the  unreliable  channel. 

Let  us  consider  the  equilibrium  point  when  mixed  strategies  are  allowed. 


Lemma  4.  The  two-player  forwarding  game  has  a  mixed  strategy  Nash  Equilibrium  when  both  nodes  play 
Forward  with  probability  P  = 

Proof:  We  denote  the  probability  of  Forward  for  node  i  as  pi,  and  the  probability  of  Forward  for  node  —i 
as  p-i.  The  expected  payoff  of  node  i  is  piP-i{a  —  1)  —  Pi{l  —  P-i)-  The  best  response  for  node  i  is  Discard 
when  P-i  <  and  Forward,  when  For  node  — i,  the  best  response  strategy  is  similar;  Forward, 

when  Pi  >  ^,  and  vice  verse.  The  intersection  of  the  two  best  response  strategies  is  the  Nash  Equilibrium, 
where  both  nodes  play  Forward  with  probability  pi  =  p-i  =  ^-  □ 

Further,  we  consider  the  unreliable  channel  and  re-solve  the  game.  The  variation  the  channel  brings  in 
is  that  it  is  still  possible  that  even  node  —i  plays  Forward,  the  credits  are  lost  due  to  transmission  failure, 
so  that  it  appears  to  be  a  Discard  for  node  i  and  hence  harm  the  payoff  of  node  i. 
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Lemma  5.  Under  unreliable  channel,  the  two-player  forwarding  game  has  a  mixed  strategy  Nash  Equilibrium 
when  both  nodes  play  Forward  with  probability  p  =  ^ . 

Proof:  We  use  the  same  notations  in  Lemma  4.  The  expected  payoff  of  node  i  is  (1  —  pe)\piP-i{a  —  1)(1  — 
Pe)  —  Pi{^  —  P-i  +P-iPe)]-  The  best  response  for  node  i  is  Discard  when  ^ ,  and  Forward,  when 

P-i  >  same  reasoning  is  valid  for  node  —i.  Thus  the  equilibrium  reaches  when  both  nodes  play 

Forward  with  probability  p  =  ^ .  □ 

Let  us  further  generalize  the  channel  model.  We  consider  a  channel  with  M  different  states,  with  channel 
loss  probability  Pe{j)  for  the  jth  state.  We  also  define  a  M  x  M  channel  transition  probability  matrix  A, 
in  which  is  the  probability  that  channel  goes  from  state  j  to  k.  A  special  case  of  this  model  is  the 
widely  used  two  state  (Good  or  Bad)  channel  model,  where  channel  transfers  from  one  state  to  another  with 
some  probability.  The  updated  problem  now  can  be  stated  as:  Given  a  user’s  current  channel  state  j  and 
transition  probability  matrix  A,  how  do  we  solve  the  game  as  defined  in  Table  4. 

To  solve  this  game,  we  bring  the  channel  as  the  third  player.  Unlike  node  i  or  —i,  the  channel  does  not 
care  for  a  payoff,  and  the  game  can  be  solved  with  only  payoffs  of  node  i  or  —i.  If  the  channel  condition 
goes  from  j  to  k,  the  payoff  to  node  i  is  piP-i{a  —  1)(1  —pe{k))  — Pi(l  —p-i  +p-iPe{k)).  However,  this  payoff 
depends  only  on  probability  of  Oj^k-  To  consider  all  the  channel  transition  probabilities,  node  i’s  payoff  with 
j  as  the  current  channel  state  is 


u{i)  =  aj,fc(l  -Pe(fc))  pip-i{a-l){l-pe{k)) -pi{l-p-i+p-iPeik)) 

k,k^j 


(1) 


The  best  response  for  node  i  is  to  set  pi  as 

Pi  = 


1 


(2) 


If  both  nodes  i  and  —i  play  the  same  mixed  strategy  with  Forward  probability  in  equation  (2),  it  forms  a 
mixed  strategy  Nash  Equilibrium. 

Thus,  we  extend  Lemma  5  for  the  generalized  channel  model  as  the  following  Lemma  with  the  discussion 
above  serving  as  the  proof. 


Lemma  6.  Under  a  generalized  unreliable  channel  model  with  channel  loss  probability  Pe{j)  for  the  jth  state 
and  probabilities  for  the  state  transitions,  the  two-player  forwarding  game  has  a  mixed  strategy  Nash 
Equilibrium  when  both  nodes  play  Forward  with  probability  p  =  - ttyy  . 


Further,  we  have 

Corollary  2.  Under  a  generalized  unreliable  channel  model,  Nash  Equilibrium  is  achieved  if  incentive  a 
is  made  equal  to  n - nwr- 

To  summarize,  we  have  the  the  following  implications,  i)  The  nodes’  strategies  are  affected  by  the  channel 
loss  probability.  We  cannot  ignore  the  wireless  channel  condition  when  designing  credit  exchange  systems, 
ii)  When  nodes’  actions  are  hidden,  mixed  strategy  can  lead  to  a  practical  Nash  Equilibrium.  Due  to  the 
selfishness  of  the  nodes,  pure  strategies  are  mostly  impractical,  hi)  The  pricing  model  used  by  the  nodes 
should  be  carefully  evaluated  to  incorporate  factors  like  channel  loss  and  best  response  strategies. 
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3.2  The  Repeated  Game 

So  far,  we  have  discussed  the  strategies  and  payoffs  in  one-shot  packet  forwarding  game  (i.e.,  stage  game).  In 
this  section,  we  will  analyze  the  strategies  the  nodes  will  adopt  in  a  repeated  game,  which  is  the  repetitions 
of  the  same  stage  game.  We  define  the  repeated  packet  forwarding  game  as: 

Definition  9.  The  repeated  packet  forwarding  game  G  is  a  two-player  repeated  game,  with  a  space  of 
strategy  profile  S,  and  a  vector  u  of  von  Neumann- Morgenstern  utility  functions  defined  over  S.  Thus, 
G  =  ({1,2},5,  tt)  where  S  =  and  pf'^  is  the  probability  in  stage  r  that  player  i  plays  Forward, 

u  =<  Ui,  U2  >,  and  Ui  =  Ylir>t  is  the  discounted  payoff  of  player  i  from  the  tth  stage. 

The  non- negative  parameter  6  is  called  the  discount  factor  and  5  €  (0, 1).  The  discount  factor  infers  the 
preference  of  time  or  patience.  A  large  5  shows  a  player’s  patience  in  the  game  and  good  valuation  of  payoffs 
she  gets  in  future  stages,  while  a  small  6  means  that  the  player  is  more  eager  for  immediate  payoffs  and  has 
higher  probability  of  leaving  the  game  after  each  stage. 

In  the  presence  of  unreliable  channel,  when  a  packet  transmitted  by  node  —i  fails  to  reach  node  i,  node  i 
cannot  instantly  distinguish  whether  node  —i  intentionally  dropped  the  packet  or  it  was  due  to  channel  loss. 
However,  statistically,  node  i  could  nonetheless  gather  information  and  calculate  an  observed  forwarding 
probability  for  node  —i  in  the  following  way. 

Definition  10.  The  observed  forwarding  probability  of  player  —i  at  stage  r  is 

P-l  =  {1  -  Pe(.k))p^^l 


assuming  the  ehannel  is  at  state  k. 


Definition  11.  The  observed  payoff  of  player  i  at  stage  r  is 


k,k^j 


=  -  Pe{k))ap^I_l  -  l]pt^ . 

k,k^j 

Definition  12.  The  observed  average  discounted  payoff  of  player  i  is 


U,  =  {1-6)Y,S^4^- 

t>o 

The  (1  —  S)  term  is  to  unify  the  sum,  so  that  the  summation  is  1  if  there  are  infinite  number  of  stages. 


In  the  remaining  part  of  this  section,  we  present  some  of  the  well-known  strategies  (Last  Step  Trigger  [38], 
Naive  Grim  Trigger  [38],  Grim  Trigger  [38],  and  CORE  [34,  35])  and  analyze  their  behaviors  as  well  as 

(r) 

limitations.  In  order  not  to  introduce  any  confusion,  we  denote  p)  f  as  the  probability  node  i  should  take  to 
forward  packet  at  rth  stage  according  to  the  strategy  profile  S. 
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3.2.1  Last  Step  Trigger 

Definition  13.  The  strategy  of  Last  Step  Trigger  (LST)  is  defined  as 


Pi,LST 

(r) 

Pi, LST 


1 


1  ifP-i  =  1 
0  otherwise 


From  Definition  10,  we  know  that  it  is  not  possible  for  =  1  under  channel  with  Pe{k)  0.  Thus, 
both  nodes  will  play  Diseard  from  the  second  stage,  no  matter  what  they  played  in  the  first  stage.  In  other 
words,  there  is  no  non-trivial  equilibrium  point. 

Thus,  to  show  some  tolerance  towards  channel  loss.  Last  Step  Trigger  strategy  sets  a  threshold  value  (p) 
for  observed  forwarding  probability. 


Definition  14.  The  strategy  of  Last  Step  Trigger  with  channel  loss  tolerance  is  defined  as 


JO)  _  1 

Pi, LST  —  ^ 

J^)  -  I 

Pi,LST  —  ) 


1 


if  p^li  >  P 


0  otherwise 


(r) 

where  p_  is  defined  in  Definition  10. 


Lemma  7.  In  repeated  packet  forwarding  game,  if  both  players  use  LST,  the  subgame  perfect  Nash  equilibrium 
(SPNE)  is  Forward  if  and  only  if  pe{kr)  <  1  —  p  and  forwarding  cost  conforms  to  (f). 

Proof:  We  denote  Pe{kr)  as  the  channel  loss  probability  for  the  rth  subgame,  and  the  initial  state  is  /cq. 
a(r,r+i)  is  the  channel  transition  probability  from  rth  subgame  to  the  immediate  next  stage.  If  Pe(^r)  —  p 
in  the  first  stage,  =  1  and  =  1  —  Pe{k-i);  the  observed  payoff  for  node  i  is 

ft®  =  a(_^o)(«  -  1  -  ape{ko)). 

The  observed  average  discounted  payoff  for  node  i  is 

Ui  =  {1-  5)'^6''a(^r-i,r){ct  -  1  -  ape{kr)).  (3) 

r>0 

The  payoff  is  the  same  for  node  —i.  If  node  i  unilaterally  deviates  at  the  first  stage  by  setting  pf^^  =  0,  its 
observed  payoff  at  this  stage  is  0.  In  the  next  stage,  p[^^  =  1,  =  0,  and 

.(1) 

Ui  =  -a(-i,o)- 

Using  the  same  logic,  the  observed  average  discounted  payoff  for  node  i’s  deviation  is 

Ui^dev  =  (i-  ~  <^^^'*~^(~Q(2r+1.2r+2))- 

r>0 

Using  one-shot  deviation  property  (OSDP;  see  Definition  7),  when  Ui^dev  <  Ui,  node  i  has  no  incentive  to 
deviate.  After  some  algebraic  manipulations,  this  condition  reduces  to 

^  "  <^^''^^«(2r+l,2r+2)] 

-PeCfcr))] 

If  Pe{kr)  >  1  —  p,  from  second  stage,  both  nodes  will  get  zero  payoffs.  Thus,  no  equilibrium  point  exists.  □ 
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As  a  special  case  of  Lemma  7,  if  the  channel  is  static,  with  the  same  loss  probability  of  pe,  then 


JJi  =  {1  —  5)  6^ {a  —  1  —  ape)  =  a  —  1  —  ape- 

r>0 


Under  OSDP,  the  restriction  on  discount  factor  reduces  to 


«(1  -Pe) 
ape  +  1  —  a 


<6  <1. 


As  a  matter  of  fact,  what  Lemma  7  states  is  a  pure  strategy  SPNE  where  both  nodes  adheres  to  LST  and 

(r)  (r) 

forwards  or  discard  with  probability  1,  i.e.,  p\  1st  ~  Pi  ■  However,  since  the  channel  is  unreliable,  as  long 
as  node  i  forward  packet  with  probability  p^f^  {ptiST  such  that  p^p  >  p,  node  —i  will  cooperate  in 

the  next  stage.  Thus,  mixed  strategy  equilibrium  analysis  can  be  applied  to  LST,  and  when 
mixed  strategy  SPNE  can  be  achieved. 

We  follow  the  same  procedure  as  the  pure  strategy  analysis  to  find  the  mixed  strategy  SPNE.  When 

(r)  (r) 

nodes  i  and  —i  play  Forward  with  probabilities  p]  '  and  p_-  respectively,  the  observed  average  discounted 
payoff  for  node  i  is 

Ui  =  (1  -  Pe{kr))ap^^l  -  l]p'f\  (6) 

r>0 

The  observed  average  discounted  payoff  for  node  i  under  deviation  is 


Ui^dev  —  —(1  —  <^)  (^^^'*~^Q(2r+l,2r+2)T 


(2r+2) 


Therefore,  according  to  OSDP,  the  condition  to  attain  mixed  strategy  SPNE  is 

a  >  - = - 1  \  t  \ - •  (o) 

Er>o[<^''«(r-l,r)(l  -Pe(fer))b_]pi 

Eor  the  special  case  of  static  lossy  channel  and  pf''^  =  ^  (m  ,  the  mixed  strategy  SPNE  condition  reduces 


— ^  <6<1.  (9) 

I  —  ap 

Eigure  2  shows  the  sensitivity  of  the  mixed  strategy  SPNE  obtained  with  LST.  The  plots  represent  the 
range  of  5  for  a  given  value  of  p  or  a.  The  plots  also  show  the  properties  of  pure  strategy  SPNE,  because 
when  =  1,  p  =  1  —  pe(A:).  It  is  suggested  from  the  plots  that  a  large  a  value  corresponds  to  a  smaller  p 
value,  which  means  large  incentives  relax  the  unreliability  in  the  channel. 

3.2.2  Naive  Grim  Trigger 

Definition  15.  Naive  Grim  Trigger  (NGT)  is  defined  as: 

JO)  _  1 

(  (r') 

Jr)  _  j  1  ifpLi  >  p  for  allr  <r 
i,NGT  \  Q  otherwise 
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Figure  2:  Sensitivity  of  SPNE  when  LST  is  adopted. 


N-GRIM  is  an  unforgiving  strategy.  A  node  is  punished  if  it  deviates.  If  both  nodes  play  Forward  and 
Peikr)  <  1  —  P,  the  observed  average  discounted  payoff  is  given  by  equation  (3). 

Let  us  consider  that  node  i  unilaterally  deviates  at  the  first  stage  by  setting  p®  =  0;  it  gets  zero  payoff 
at  the  first  stage.  In  the  second  stage,  by  observing  node  i’s  dropping  behavior,  node  —i  sets  =  0,  thus 
the  payoff  for  node  i  is  =  a(_i  o)(— 1).  No  further  payoffs  will  be  gained  in  future  subgames,  because 
both  nodes  will  punish  each  other  for  their  prior  dropping  behavior.  Under  OSDP,  N-GRIM  attains  subgame 
perfect  Nash  equilibrium  if  and  only  if  (1  —  S)  Ylr>o  —  1  —  ape{kr))  >  a(o,i)  ^  (“1);  which  is 

equivalent  to 

^  (1  “ Sr>0 

The  mixed  strategy  SPNE  analysis  for  N-GRIM  is  similar  to  what  we  have  done  for  LST.  Equilibrium  can  be 
attained  when  ,  and  the  mixed  strategy  SPNE  and  its  pure  strategy  counterpart  are  equivalent 

when  p  =  1  —  pe{k). 

N-GRIM  only  punishes  the  nodes  if  they  deviate.  Often  times,  a  severe  punishment  is  set  up  to  enforce 
cooperation  between  nodes.  A  Grim  Trigger  is  such  a  strategy  that  punishes  a  node  for  its  own  deviation, 
not  just  others. 


3.2.3  Grim  Trigger 

is  defined  as: 

1 

f  1  if  p^fi-\ pf"  ^  >  p  for  all  r' <  r 
[  0  otherwise 

Using  the  same  reasoning,  if  node  i  unilaterally  deviates  at  the  first  stage,  it  does  not  gain  any  payoff. 
Afterwards,  according  to  GRIM,  both  nodes  will  play  Discard.  Therefore,  Ui^dev  =  0.  Thus,  the  sufficient 
condition  for  subgame  perfect  Nash  equilibrium  under  OSDP  is  a  —  1  —  ape{kr)  >  0.  Hence, 

(11) 


Definition  16.  Grim  Trigger  (GRIM) 

JO) 

Pi, GRIM  — 

(r)  _ 

Pi, GRIM  — 
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In  the  aforementioned  analysis  on  three  trigger  strategies,  we  have  shown  that  pure  strategy  SPNE  is 
attainable  and  it  can  enforce  full  cooperation.  However,  the  conditions  to  achieve  pure  strategy  SPNE  are 
associated  with  the  channel  loss.  Furthermore,  because  of  the  channel  loss,  mixed  strategy  SPNE  is  also 
feasible;  nonetheless,  mixed  strategy  SPNE  is  not  desirable  because,  it  does  not  enforce  full  cooperation  (i.e., 

(P) 

p\  <  1).  In  other  words,  with  the  trigger  strategies,  unless  the  channel  loss  rates  are  known,  cooperation 
is  hard  to  achieve. 

3.2.4  CORE 

CORE  is  a  complex  strategy  for  cooperation  enforcement  proposed  in  [34,  35].  It  is  similar  to  the  well-know 
Tit-For-Tat  (TFT)  but  different  from  TFT  as  it  considers  the  last  b  stages  in  the  repeated  game. 


Definition  17.  The  CORE  strategy  can  be  defined  as: 


JO) 

Pi, CORE 


=  1 


(r)  _ 

Pi, CORE  — 


1  = 

0  otherwise 


where  the  sth  stage  observed  history  h^  =  and 


f  1  if  X  >  1 

\  —  1  if  X  <  1 


Theorem  1.  CORE  is  subgame  perfect  if  and  only  if 


0  <  a  <  - . 

1  -Pe 


Proof:  CORE  is  a  6-stage  history  strategy  because  it  takes  into  account  what  happened  in  the  last  6  stages. 
Without  loss  of  generality,  let  us  assume  that  any  history  h"'  results  in  Bi  =  m,  where  m  is  uniformly 
distributed  over  [—6,6],  and  m  is  a  discrete  integer  random  variable.  For  the  sake  of  discussion,  we  assume 
B-i  >  0  and  the  channel  has  a  loss  probability  pe-  We  focus  on  node  i  from  the  first  stage  (starting  stage). 
In  the  proof,  we  will  calculate  the  expected  payoffs  for  node  i  considering  different  values  m  can  take.  A 
certain  m  value  leads  to  a  specific  payoff,  and  it  comes  with  a  probability  as  m  changes.  We  first  formalize 
the  expected  payoff  function,  and  then  derive  the  probability  terms  in  the  function. 

Case  1.  If  node  i  conforms  to  CORE,  then  its  payoff  at  the  first  stage  would  be 

if®  =  P(m  =  0)[(l-pe)hf°’’’"“''‘^  +  Penf"“"'^] 

+  P(m  >  +  P(m  <  0)hf (12) 


where  P(-)  means  the  probability  and  ^discard  observed  payoffs  of  “Forward”  and  “Discard”. 

From  Definition  11,  we  know  that 


^  forward  /-i  \  i 

ui  =  a{l-pe)-l 


(13) 


^discard 


0. 


(14) 


Consider  the  next  stage,  node  —i  will  update  Bi  with  node  i’s  stage  0  behavior,  and  we  denote  the  updated 
Bi  value  as  m' .  Similar  to  equation  (12),  the  observed  payoff  at  this  stage  is 

=  P(m'  =  0)[(l-Pe)hf~''+Pehf"“''‘'] 

+  P(m'  >  +  P(m'  <  0)hf 


20 


With  the  process  of  m  to  m!  taken  into  consideration,  we  have  the  following  equations. 


(15) 

(16) 
(17) 


P(m'  =  0)  =  pePim  =  0)  +  (1  —  pe)^{m  =  — 1) 
P(m'  >  1)  =  PeP{m  >  2)  +  (1  —  pe)Y*{m  >  0) 
P(m'  <  0)  =  PeP{m  <  0)  +  (1  —  Pe)^{m  <  1) 


Since  the  distribution  of  m  is  already  known,  we  can  obtain  the  probabilities  as 

P(m  =  x)  =  (18) 

T’  -I-  /)  -I-  1 

P(m<x)  =  ,  ^  (19) 

where  integer  value  x  €  [—6,  6]. 

Further,  let  us  consider  the  Zth  stage.  We  denote  Ip  as  the  random  variable  of  positive  scores  gained,  i.e., 
the  cooperation  action  is  observed  successfully  by  the  opponent  {Bi  =  Bi  +  1),  In  as  the  random  variable  of 
negative  scores  gained,  i.e.,  the  cooperation  action  is  not  observed  by  the  opponent  {Bi  =  Bi  —  1).  Since 
node  i  conforms  to  CORE,  and  forwards  at  each  stage,  any  gain  in  is  due  to  the  channel  loss.  Thus,  Ip 
and  In  are  binomially  distributed  and 

P{lp  =  Lp)  =  Cl^^{l-pe)^^pi-^^  (20) 

P(/„  =  Ljv)  =  (21) 

We  denote  a  random  variable  y  =  Bi.  From  the  analysis  above,  we  know  y  =  m  +  lp  —  In-  Similar  to 
and  we  can  write  as 


uf  =  P(y  =  0)[(l-Pe)«f“‘'+Penf*“’’"] 
+  P(y  >  +  P(y  <  0)uf"“''‘='. 


(22) 


Hence,  the  problem  relies  on  obtaining  the  probability  distribution  of  y.  Since  y  =  m  +  Ip  —  {I  —  Ip)  = 
m  +  2lp  —  I  and  I  is  a  constant,  to  get  the  distribution  of  y,  we  first  get  the  distribution  of  w  =  y  +  1. 

We  use  the  probability  generation  function  (pgf).  For  discrete  random  variable  x,  its  pgf  is  defined  as 

OO 

Gx{z)  =  E[z^]  =  z^P{X  =  x)  (23) 

X=Q 

The  pgf  for  w  is 


Gwiz) 


E[z^]  =  ^  E[z^]E[z‘^^P] 

b  I 


Z-b  _  ^b+1 

(26+l)(l-z) 


\Pe  +  (1 


Let  f^^Hx) 


d"f(x) 
dx"  ’ 


P(r(;  =  k) 


For  the  probability  terms  in  equation  (12), 


gg^(O) 

k\ 


P{y  =  0)  =  P(r(;  =  1) 


gS(o) 

n 


(24) 


(25) 


(26) 
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(27) 


P(y  >  1) 


P(i(;  >  /  +  1) 


E 


n=/H-l 


nl 


P{y  <  0) 


P(t(;  </)  =  !  — 


gig(O) 

ll 


E 


n=/+l 


g£^(o) 

n! 


(28) 


Plugging  equations  (26),  (27)  and  (28)  back  to  equation  (22),  we  can  get  the  observed  payoff  for  node 
i  at  rth  stage.  Further,  if  node  i  conforms  to  CORE,  its  observed  average  discounted  payoff  is  given  as 
Ui  =  (1  —  5)  .  Algebraic  manipulation  reduces  to 


tji  =  {I-  (5)[a(l  -pe)  -  1] 


I -Pe  +  b 
26  +  1 


+  E«’' 

r=l 


(1  -Pe) 


G+O)  ,  +  gS;>(0) 


rp  I 


b+r 

+  E 

n=r+l 


n\ 


(29) 


Case  2.  If  node  i  does  not  conform  to  CORE,  and  it  deviates  in  the  first  stage,  the  payoff  at  the  first 
stage  is 


u. 


i.dev 


=  P(m  >  1)^^ 


+  P(m  <  0)u(* 


(30) 


Since  the  deviation  lasts  only  for  one  stage,  in  the  next  stage,  node  i  will  again  play  CORE.  However,  Bi  is 
updated  as  =  m  —  1,  which  we  denote  as  m* .  It  is  not  hard  to  obtain  the  cumulative  distribution  function 
(cdf)  of  m*  as 


P(m*  <  x) 


0  if  X  <  —6  —  1 

if  -  6-  l<x<6-l 
1  if  X  >  6  —  1 


(31) 


Using  the  same  notations  defined  in  Case  1,  we  let  y*  =  m*  +  lp  —  In  and  W*  =  y*  +  1.  The  pgf  for  W*  is 

fe-i  i 

Gw.(z)  = 

n=—b—l  n=0 

z-^-^  - 
(26  +  l)(l  -z) 

Similarly,  the  observed  average  discounted  payoff  for  node  i  given  it  deviation  on  the  first  stage  is  presented 
in  equation  (33). 


be  +  (1  -Pe)^] 
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(32) 


Ui^ev  =  (1  -  (5)[a(l  -  Pe)  -  1] 


26 


26+1 


r=0 


<Uo) 


(1  -  Pe) 


b+r 


p  I 


+  E 

n=r+l 


n\ 


(33) 


Under  OSDP,  CORE  is  subgame  perfect  if  deviation  is  not  profitable,  or  IJi^dev  <  Ui.  We  denote 
A  =  (1  -  Pe)^  +  Ent;+1  and  R  =  (1  -  Pe)^^  +  Ent;+1  ThuS, 


n! 

oo 


U^4ev-U^  =  (1  -  d)[a{l  -  Pe)  -  1][^  ^  ^  +  B\r=o  +  ^  S^{5B  -  A)] 

r=l 


(34) 


Since  terms  A  and  B  are  probabilities,  0<A<1,0<R<1,  equation  (34)  can  be  further  reduced  as 


Since  6  >  1, 


Gi^dev  Gi  + 


< 


(1  -  «)[o(l  -Pe)  -  +  B|e.O  -  «B|e.ll 

(l-«)Ml-Pe)-l]t±£^ 


1 

0  <  a  <  - . 

1  -Pe 


(35) 

(36) 

□ 
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fr)  (r) 

Corollary  3.  When  both  nodes  adopt  CORE,  the  equilibrium  point  is  pi  =  p_-  =  1,  V  r  >  0.  Cooperation 
is  hence  achieved. 

Remark  1 :  The  theorem  suggests  that  CORE  does  not  enforce  cooperation  by  rewards,  but  punishment. 
An  incentive  based  strategy  bears  the  basic  constraint  that  a  >  1,  so  that  the  forwarding  is  profitable.  In 
CORE,  even  if  the  channel  is  reliable,  a  <  1.  This  property  can  be  further  generalized  to  any  strategy  where 
decision  is  based  on  past  action  profiles,  especially,  when  a  series  of  past  actions  are  considered. 

Remark  2:  CORE  can  be  regarded  as  a  “reputation  based”  strategy.  Nodes  gain  reputation  when  for¬ 
warding  is  observed  and  loose  reputation  when  discarding  is  observed.  Although  noise  exists  (channel  is  not 
reliable),  CORE  can  still  lead  to  subgame  perfect  Nash  equilibrium,  and  hence,  full  cooperation.  Simulation 
results  presented  in  Section  6.1  also  show  the  effectiveness  of  CORE  under  “imperfect  monitoring”. 

3.3  Collusion  Resistance  and  Coalition  Formation 

In  this  section,  we  consider  how  cooperation  is  enforced.  In  particular,  we  address  two  aspects;  i)  how 
to  resist  collusion  among  nodes  that  deviate  from  the  cooperation  strategy,  and  ii)  how  the  population 
of  cooperative  nodes  grows  and  cooperation  prevails?  It  is  noted  that  we  still  focus  our  analysis  on  the 
forwarding  game  (i.e.,  single  hop  forwarding).  A  good  literature  of  incentive-compatible  and  strategyproof 
collusion  resistance  routing  can  be  found  in  [26,  45,  46,  53]. 

3.3.1  Collusion  Resistance 

Definition  18.  Collusion  is  a  group  of  players  working  together  to  maximize  their  own  payoffs  regardless 
of  the  social  optimum.  A  strategy  s'^  is  a  colluding  strategy  if  and  only  if 

where  is  any  strategy  other  than  s^.  R  is  called  a  strict  colluding  strategy  if  the  inequality  holds. 

We  consider  a  pure  strategy  profile  s*  which  is  subgame  perfect  and  enforce  cooperation  on  the  equilib¬ 
rium  point  (e.g.,  CORE).  The  anti- collusion  game  is  a  game  that  played  among  players  adopting  colluding 
strategy  s^  and  cooperative  strategy  s*.  The  aim  of  the  anti-collusion  game  is  to  suppress  collusion  and 
achieve  cooperation.  In  the  rest  of  this  section,  we  are  interested  in  find  the  conditions  that  ensure  the 
outcome  of  the  anti-collusion  game  is  full  cooperation. 

Let  Xc  be  the  population  share  of  a  strict  colluding  pure  strategy  profile  The  following  lemma  gives 
an  upper  bound  on  Xc- 

Lemma  8.  A  cooperation  enforcement  strategy  s*  is  collusion  resistant  if  and  only  if 

^  ^  _ Ui{s\.s^)-Ui{.sf.s^) _ 

Ui{sf  s'^)  +  Ui{s*,s*)  -  Uiisf  s*)  -  Ui{s*,  s'^) 

Proof:  We  assume  the  number  of  nodes  in  the  game  is  n.  Eor  the  group  of  cooperating  nodes,  the  group’s 
total  payoff  is 

U*  =  n(l  —  Xc)Ui{s*,  s*)  nXcUi{s* ,  s'^).  (38) 


(37) 
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The  total  payoff  for  the  group  of  colluding  nodes  is 

tj^  =  n(l  -  Xc)Ui{s^,  s*)  +  nXcUiis^,  s^).  (39) 

Collusion  resistance  requires  that  U*  >  Therefore, 

Xc[Ui{s\s^)  -  Ui{s\  s*)  +  Uiis^,  s*)  -  Ui{s^,  >  Uiis^,  s*)  -  Ui{s\  s*). 

Since  subgame  perfect  Nash  equilibrium  requires  Ui{s*,  s*)  >  s*)  and  strict  colluding  infers  lJi{s^,  s'^)  > 

Ui{s*,s'^),  we  get  equation  (37).  □ 

3.3.2  Coalition  Formation 

Lemma  8  shows  that  in  order  to  resist  collusion,  the  colluding  node  population  should  be  kept  under  a 
threshold.  However,  when  the  games  are  played  over  time,  the  population  of  different  groups  (i.e.,  cooperative 
or  colluding)  is  highly  dynamic.  We  apply  evolutionary  game  theory  [48]  in  our  following  analysis  to  capture 
the  dynamics  on  population. 


Definition  19.  Let  A  be  a  strategy  set,  when  strategies  Sx,Sy  G  A.  Sx  is  an  evolutionarily  stable 
strategy  (ESS)  if  for  every  strategy  Sy  ^  Sx  there  exists  some  ly  €  (0, 1)  such  that 

u[sx,  esy  +  (1  -  e)sy]  >  u[sy,  esy  +  (1  -  e)sa;] 


for  all  e  G  (0,  Cy). 


Proposition  1.  A^^^  =  {sa,  g  A^^  :  u{sx,Sy)  >  u{sy,Sy),  M  Sy  ^  Sa:};  where  A^^  denotes  the  set  of 
Nash  Equilibrium  strategies. 


We  consider  the  same  s*  and  assume  it  is  ESS.  We  denote  x*  as  the  population  share  of  nodes  adopting 
s*,  i.e.,  group  of  cooperative  nodes.  It  is  clear  x*  +  Xc  =  1. 

According  to  evolution  theory,  the  dynamics  for  the  population  of  x*  is 


X*  =  [u(s*,  s^)  —  u{s'^,  s'^)]Xh, 

Let  Ma  represent  the  payoff  matrix  when  s*  plays  s'^. 

_  /  u{s*,s*)  u{s*,s^)  \ 

^  \u{s^,s*)  u{s^,s^)  ) 

This  matrix  also  holds  true  for  the  player  plays  s'^.  Applying  Ma  to  equation  (40),  we  get 
X*  =  [(u(s*,  s*)  -  «(s‘^,  s*))x*Xc]x*  +  [(r(s*,  s^)  -  u(s‘^,  s‘^))x*Xc]Xc 

-  (ur^=t=  U2  Xf;)X;,:Xf, 

where  oi  =  u{s* ,  s*)  —  u{s^,  s*),  a-2  =  u(s‘^,  s'^)  —  u{s* ,  s'^). 


(40) 


(41) 


Lemma  9.  The  cooperation  enforeement  strategy  s*  leads  to  +1  evolutionarily  stable  state  on  population 
share  if  and  only  if  s*  is  ESS  or  the  initial  population  share  x[]  <  ai/(ai  +  02). 


Proof:  For  any  x*  <  1,  the  +1  state  can  only  be  reached  if  x*  >  0.  Since  Xc,x*  >  0,  it  requires 

aix*  —  a2Xc  >  0.  If  0102  <  0.  The  only  possibility  is  oi  >0,  02  <  0,  and  indicates  s*  is  ESS  (Proposition  1). 

Ifai02  >0.  xO  <  □ 
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It  can  be  noted  that  in  case  s*  is  not  ESS,  or  are  the  mixed  strategy  Nash 

Equilibrium  values.  It  suggests  that  when  no  ESS  exists,  the  strategy  with  the  initial  population  greater 
than  the  equilibrium  value  prevails. 

Summarizing  the  discussions  above,  we  have  the  following  theorem  on  a  general  cooperation  enforcement 
strategy. 

Theorem  2.  A  cooperation  enforcement  strategy  s*  enforces  the  prevalence  of  cooperation  if  and  only  if  it 
satisfies  either  of  the  following  two  conditions: 

•  s*  is  ESS, 

•  <  minf^^ _ ^ _ 

Remarks:  In  the  second  condition,  both  terms  in  the  minimization  function  are  the  same  if  d  =  1  for  the 
repeated  game.  It  also  suggests  that  when  all  the  players  in  the  game  stick  to  continuous  participation,  the 
colluding  nodes  will  be  enforced  to  be  cooperative  with  time.  Thus  collusion  resistant  is  bona  hde  cooperation 
coalition  formation.  The  sensitivity  of  the  convergence  of  the  formation  (i.e.,  i:*)  will  be  determined  by  the 
payoff  matrix  entries. 

4  Coperation  Enforcement  with  Heterogeneous  Unreliable  Channel 

When  we  extend  our  discussions  in  last  section  to  include  the  case  of  multi-hop  data  communication  in 
wireless  networks,  an  important  challenge  in  such  scenario  is  that  different  nodes  will  experience  hetero¬ 
geneous  channel  conditions.  In  this  section,  we  provide  a  game  theoretic  solution  to  enforce  cooperation 
in  a  multi- hop  wireless  network  in  the  presence  of  heterogeneous  channel  noise.  We  focus  on  the  packet 
forwarding  process  and  model  it  as  a  hidden  action  game  with  imperfect  private  monitoring  in  Section  4.1. 
In  Section  4.2,  we  propose  a  state  machine  based  strategy  to  reach  Nash  Equilibrium.  The  equilibrium  is 
proved  to  be  a  sequential  equilibrium  with  carefully  designed  system  parameters.  Eurthermore,  in  Section  4.3 
we  extend  our  discussion  to  a  general  multi-hop  wireless  network  scenario  by  refining  the  strategy  profiles 
to  handle  multi-hop  packet  forwarding. 

4.1  The  Packet  Forwarding  Game  under  Heterogeneous  Noise 

We  begin  our  analysis  with  a  review  of  the  classical  two-player  packet  forwarding  problem  [14,  17].  As  shown 
in  Eigure  3,  we  consider  two  data  sessions;  (i)  As  to  Ad  through  Bg  and  (ii)  Bg  to  Bd  through  As-  If 
the  channel  is  perfect  (loss  free),  based  on  the  actions  As  and  Bs  take,  they  will  obtain  different  payoffs 
as  listed  in  Table  5.  The  entries  in  the  matrix,  i.e.,  R,  S,  T,  P,  not  only  determine  the  payoffs  players  can 
obtain,  but  also  indicate  the  type  of  the  games.  Eor  example,  in  the  well-known  Prisoner’s  Dilemma  [38], 
which  also  characterizes  the  scenario  of  packet  forwarding,  T  >  R  >  P  >  S .  It  is  noted  that,  depending  on 
how  the  packet  system  is  configured,  the  values  in  the  matrix  might  be  different.  In  this  research,  instead  of 
using  a  specihc  payoff  matrix  like  [17],  we  assume  the  matrix  has  a  general  format  as  shown  in  Table  5,  and 
later,  we  will  show  how  the  values  in  the  matrix  affect  the  equilibrium  properties  of  our  strategy.  With  the 
payoff  matrix,  it  is  clear  that  for  an  action  a  =  {aAg,aBs)  =  {Forward,  Discard),  the  payoff  vector  would 
definitely  be  it  =  {uAg,UBg)  =  {S,T). 
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Figure  3:  Two  player  packet  forwarding  game  model. 


Node  Bs 


Node  As 

Forward  Discard 


Forward 

Discard 


R  R 

S  T 

T  S 

P  P 

Table  5;  Payoff  matrix  of  two  player  packet  forwarding  game.  R=Reward,  S=Sucker,  T=Temptation, 
P= Penalty. 


However,  when  we  bring  in  the  channel  loss,  even  if  both  nodes  take  the  same  action  as  above,  the 
payoff  vector  is  not  likely  to  remain  the  same.  For  node  As,  it  forwards  Bs^s  packet  to  B^,  but  the 
forwarding  action  might  fail  due  to  the  channel  noise,  and  B^)  does  not  receive  the  packet.  Since  H^’s  payoff 
is  determined  by  whether  B^  receives  the  packet,  from  node  Rs’s  perspective.  As  is  playing  Discard  even 
though  its  action  was  Forward.  Thus,  the  payoff  vector  now  is  u  =  {P,P).  Nonetheless,  node  Bs  cannot 
directly  observe  H^’s  action.  This  is  because  what  Bs  can  observe  relies  only  on  the  channel  between  it  and 
As  and  this  channel  is  different  from  that  between  As  and  Bjo  due  to  interference.  Also,  we  do  not  assume 
that  Bd  can,  through  some  mechanism,  inform  Bs  about  whether  the  packet  is  received  or  not.  Hence, 
what  the  nodes  can  do  is  to  monitor  their  own  payoffs  {realized  payoff),  and  indirectly,  form  a  belief  on 
what  others  have  done.  Based  on  the  same  payoff  matrix  in  Table  5,  if  the  noise  is  presented  as  a  channel 
loss  probability  pe,  we  can  calculate  the  probabilities  associated  with  actions  and  payoffs.  In  Table  6  ,  we 
list  the  probabilities  as  node  i  plays  the  first  action  and  its  opponent  plays  the  second  action  in  the  action 
profiles.  With  these  probabilities,  we  can  further  calculate  the  expected  payoff  of  a  node.  For  example,  when 
a=  {Forward,  Discard),  the  expected  payoff  vector  is  m  =  ((1  —  pe)S  +  PeP,{l  —  Pe)T  +  peP). 


Actions 


Node  i’s  payoffs 


R 

s 

T 

P 

(F,  F) 

(1  -Pe)^ 

Pe{l-Pe) 

Pe{l-Pe) 

Pe 

(D,  F) 

0 

0 

1  -Pe 

Pe 

(F,  D) 

0 

1  -Pe 

0 

Pe 

(D,  D) 

0 

0 

0 

1 

Table  6:  Payoff  probabilities  for  given  action  profiles. F=Forward,  D=Discard. 


Let  us  now  formally  define  the  packet  forwarding  game  under  noise. 

Definition  20.  A  packet  forwarding  game  (V)  under  noise  is  a  quadruple  (I,  A,D,u),  where 
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•  /  =  denotes  the  set  of  nodes. 

•  A  is  a  space  of  actions  (ai)  a  node  (i)  can  take. 

•  kl  is  a  space  of  observed  signals.  For  every  action  a*  G  Ai  node  i  takes,  it  observes  a  signal  uoi  €  kli. 
Both  action  a*  and  signal  u)i  are  node  i ’s  private  information.  The  probability  distribution  of  private 
signal  u  =  (wi,  ...,ujn  depends  on  the  action  profile  a  =  (ai,  ...,an)  and  the  noise  in  the  channel.  It  is 
denoted  as  p{uj\a). 

•  u  presents  the  realized  payoffs.  For  node  i,  its  expected  payoff  is  given  by  gfia)  =  Tji^p{uj\a)ui{ai,uji). 

Often  times,  this  game  is  played  repeatedly  as  nodes  have  a  number  of  packets  to  be  forwarded.  From  a 
discounted  repeated  game  [38]  perspective,  the  discounted  payoff  for  node  iisUi  =  gi{a{t)) ,  where  a{t) 

is  the  action  taken  at  time  t  and  6  G  (0, 1)  is  the  discount  factor.  The  discount  factor  infers  the  preference 
of  time  or  patience.  A  large  5  shows  a  node’s  patience  in  the  game  and  good  valuation  of  payoffs  it  gets 
in  future  stages,  while  a  small  6  means  that  the  node  is  more  eager  for  immediate  payoffs  and  has  higher 
probability  of  leaving  the  game  after  each  stage. 

The  above  definition  differs  from  most  existing  game  models  in  the  sense  that  a  node  cannot  directly 
observe  others’  actions,  rather,  it  observes  through  a  private  signal  ^  associated  with  the  action  profiles 
played.  As  a  matter  fact,  existing  models  can  be  regarded  as  a  special  case  when  u;  =  a  for  all  nodes 
(all  nodes  have  perfect  public  observation  of  others’  actions),  oi  oji  =  =  ...  =  oJn  a  (all  nodes  have 

imperfect  public  observation  of  others’  actions).  While  the  existing  models  either  ignore  the  noisy  nature  of 
the  wireless  channel  or  need  some  sort  of  communications  among  nodes  to  exchange  the  observations,  our 
model  eliminates  such  pre-assumptions  and  hence  most  appropriately  abstracts  an  ad  hoc  network  scenario. 

The  outcome  of  a  single  stage  (static)  game  can  be  characterized  by  the  well-known  Nash  Equilibrium 
[38].  In  a  Nash  Equilibrium,  no  player  can  unilaterally  deviate  from  the  equilibrium  strategy  to  gain  more 
payoff;  or  in  other  words,  every  player  is  playing  the  best  response  to  others.  When  the  same  game  is  played 
repeatedly  for  finite  or  infinite  number  of  times,  the  notion  of  subgame  is  introduced  so  that  the  game  can  be 
viewed  as  a  subset  of  the  original  game  starting  at  a  certain  stage,  with  a  perfectly  or  imperfectly  monitored 
history.  The  repeated  game  can  be  analyzed  by  finding  the  Subgame-Perfect  Nash  Equilibrium  (SPNE), 
which  consists  of  a  series  of  Nash  Equilibria  at  any  subgame  stages  [38].  From  our  modeling  of  the  packet 
forwarding  game,  in  order  for  each  node  to  make  best  response  to  others’  actions  that  are  hidden,  it  first 
needs  to  form  a  belief  on  what  the  others  have  done.  A  profile  of  strategies  and  beliefs  makes  an  assessment. 
To  further  refine  SPNE  given  the  assessment,  sequential  equilibrium  [24]  is  introduced. 

Definition  21.  Sequential  Equilibrium^  is  an  assessment  of  strategy  vr  and  belief  p,,  which  satisfies  the 
following  properties: 

•  Strategy  Sensibility:  When  the  beliefs  are  fixed,  no  player  prefers  at  any  point  to  change  her  part  of 
strategy  in  vr  given  the  information  set,  i.e.,  vr  maximizes  the  expected  payoffs. 

•  Belief  Sensibility:  Those  information  sets  can  be  reached  with  positive  probabilities  (p)  given  vr. 

•  Consistency:  The  assessment  should  be  a  limit  point  of  a  sequence  of  the  mixed  strategies  and  associated 
sensible  beliefs,  i.e.,  (vr,  ^)  =  lim„^oo(7rn, /^n)- 

^It  is  noted  that  the  signal  here  does  not  necessarily  mean  the  physical  signal  in  the  communication  channel,  but  rather,  it 
refers  to  all  the  possible  observations  a  node  can  make,  e.g.,  the  payoffs. 

■^Please  refer  to  [24]  for  a  more  formal  definition. 
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Thus,  in  order  to  enforce  cooperation  in  wireless  networks  with  noisy  channel,  it  is  highly  desirable  that 
any  adopted  strategies  and  their  associated  beliefs  constitute  the  sequential  equilibrium.  Also,  this  sequential 
equilibrium  is  attainable  by  designing  the  parameters  to  calculate  the  beliefs.  To  further  clarify  the  concept 
of  sequential  equilibrium  in  the  packet  forwarding  game,  we  assume  that  although  nodes  cannot  perfectly 
observe  the  actions  of  others,  they  have  beliefs  about  what  the  opponents  have  done.  Based  on  the  beliefs, 
they  take  corresponding  actions  in  future  games.  The  sequential  equilibrium  requires  that  the  nodes  form 
their  beliefs  in  such  a  way  (e.g.,  following  Bayesian  rules)  that  the  states  associated  with  the  beliefs  can 
be  reached  with  positive  probabilities.  In  addition,  the  consequent  actions  taken  given  the  beliefs  are  the 
best  response  to  the  current  state.  A  possible  solution  to  attain  the  equilibrium  (although  not  mentioned 
explicitly  by  the  authors)  is  proposed  in  [18],  where  one  node  plays  the  Grim  Trigger  strategy  and  the  other 
one  plays  the  defection  strategy,  and  the  beliefs  are  updated  at  every  stage  of  the  game.  However,  the 
belief-based  approach  requires  complicated  calculations,  and  moreover,  their  modeling  on  the  effect  of  the 
channel  is  not  thoroughly  investigated,  as  the  Discard  action  can  never  be  observed  as  Forward.  In  this 
section,  our  goal  is  to  design  a  more  efficient  way  to  attain  sequential  equilibrium  under  the  noisy  channel 
we  have  already  defined.  Our  approach  is  different  from  [18]  in  both  design  notion  and  methodology. 

4.2  State  Machine  Based  Forwarding  Approach 

In  this  section,  we  demonstrate  how  to  construct  a  sequential  equilibrium  using  state  machine  based  for¬ 
warding  approach.  It  is  noted  that  a  larger  space  of  other  cooperation  enforcement  strategies,  as  well  as  the 
associated  equilibria  with  noisy  channels  have  been  analyzed  in  [37,  47] .  For  the  sake  of  clarity,  we  consider 
the  packet  forwarding  game  between  two  nodes. 

First,  we  define  two  types  of  observable  signals  u>.  Punishment  signal  and  Reward  signal.  We  define  that 
a  Punishment  signal  is  observed  when  the  node’s  realized  payoff  is  P,  otherwise  a  Reward  signal  is  observed. 
It  is  noted  that  a  punishment  signal  can  be  observed  even  if  node  is  playing  cooperatively.  Table  6  can  be 
used  to  calculate  p{uj\a)  given  the  action  profiles.  However,  the  observations  are  private. 

Further,  let  us  consider  a  strategy  with  two  states,  C  (Cooperative)  and  N  (Non-Cooperative).  The 
strategy  begins  with  state  C  and  operates  with  the  following  transition  probabilities. 

•  When  the  node  is  in  State  C,  play  Discard  with  a  small  probability  qc-  If  Discard  is  taken  and 
Punishment  is  observed,  transit  to  N  with  probability  pc-  Stay  in  C,  otherwise. 

•  When  the  node  is  in  State  N,  play  Discard  with  a  large  probability  qn-  If  Discard  is  taken  and 
Reward  is  observed,  transit  to  C  with  probability  pjsf.  Stay  in  N,  otherwise. 

The  state  machine  based  forwarding  approach  is  illustrated  in  Figure  4.  It  is  noted  that  in  this  approach, 
there  is  always  uncertainty  on  which  state  the  opponent  node  is,  and  hence  the  beliefs  are  updated  all  the 
time.  In  order  for  this  design  to  reach  sequential  equilibria,  it  is  important  that,  with  any  history,  the  state 
machine  is  a  best  response  to  itself,  regardless  of  the  beliefs.  In  other  words,  the  problem  is  to  find  whether 
there  is  a  set  of  the  system  parameters  (transition  probabilities),  such  that  node  i  does  not  gain  different 
payoffs  by  choosing  either  actions,  i.e.,  Forward  (F)  or  Discard  (D),  no  matter  what  state  its  opponent  node 
—i  is  in. 

The  design  problem  is  hence  reduced  to  finding  the  system  parameters  {qc,  qN,  Pc,  Pn)  to  make  the 
strategy  itself  a  best  response  to  the  state  machine.  We  denote  Vc  and  Vm  as  the  average  repeated  game 
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p( Punishment  \D) 


Figure  4;  Forwarding  state  machine. 


payoffs  for  node  i  when  node  —i  is  in  state  C  and  N  respectively.  From  Bellman  equations  [6],  we  can  write 


the  following  equations. 

When  node  i  plays  F, 

Vc  =  (l-S)l(l-qc)R  +  qcS]  +  6l(l-qcPePc)Vc  +  gcPePcVN]  (42) 

Vn  =  {1  -  d)[{l  -  qN)R  + qNS]  +  5{{1  -  Pe)pNqNVc  +  [I  -  (I  -  Pe)PNqN]VN}  (43) 

Similarly,  if  node  i  plays  D, 

Vc  =  {I  -  S)[{1  -  qc)T  +  qcP]  +  S[{1  -  qcPcWc  +  qcPcVw]  (44) 

VN  =  {l-d)[{l-qN)T  +  qNP]+5VN  (45) 


For  node  i  to  be  indifferent  between  F  or  D,  Equations  (42)  and  (44)  should  be  equal  when  node  —i  is 
in  state  C,  or  equations  (43)  and  (45)  should  be  equal  when  node  —i  is  in  state  N.  Thus,  the  solutions  for 
above  equations  represent  the  equilibria  of  the  state  machine.  The  following  theorem  provides  one  of  the 
solutions. 

Theorem  3.  For  the  state  machine  based  forwarding  approach,  there  is  a  sequential  equilibrium  for  large  5, 
when  pe  <  o,nd  T  >  R. 

Proof:  From  equations  (42)  and  (44)  we  have 

(1  -  5)[(1  -  qc){T  -  R)+pc{P-S)]  =  5qcPc{l-Pe){Vc  -  Vn)  (46) 

Thus,  from  equation  (42),  we  can  further  derive 

Ec  =  (1  -  qc)R  +  qcS  +  T^[(l  -  qc){R  -  T)  +  qc{S  -  P)]  (47) 

J.  Pe 

Similarly,  from  equations  (43)  and  (45)  we  have 

(1  -  5)[(1  -  qN){T  -  R)+p^{P-S)]  =  5qNPN{l-Pe){Vc  -  Vn)  (48) 

Ev  =  (1  -  qN)T  +  qNP  (49) 
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From  the  observation  of  equations  (46)-(49),  we  are  left  with  four  variables  and  two  equations,  which 
implies  there  are  two  free  variables.  To  find  a  possible  solution  to  the  equations,  to  begin  with,  we  consider 
Pc  as  a  free  variable  and  set  pc'  =  1.  The  reasoning  is  as  follows.  If  there  is  a  solution  of  the  above  equations 
with  PC'  <  1)  we  can  always  decrease  qc  in  equation  (47)  to  increase  Vc-  However,  this  will  lead  us  to 
further  increase  pc  to  balance  equation  (46).  Thus,  pc  can  be  increased  to  1  but  never  exceed  1  as  it  is  a 
probability.  The  same  reasoning  can  be  applied  to  the  other  free  variable  and  we  let  =  1.  For  <  1, 
increasing  q^  can  lead  to  a  decrease  in  p^  in  equation  (49)  to  balance  the  decreased  Vn  from  equation  (48). 

The  above  analysis  reduces  to 

Vn  =  P  (50) 


and 


Pn  = 


qc{P-S) 


{l-qc){T-R)  +  qc{P-S)' 

It  is  not  hard  to  see  that  pTv  £  [0, 1]. 

Putting  equations  (51)  and  (47)  back  to  equation  (48),  we  obtain  a  quadratic  equation  of  qc  as 


(51) 


{5il-pe)[R-S-,^{T-R  +  S-P)]}q^c 

J-  Pe 

+{,5(1  -P,)[P-R-  -^{T  -  R)]  +  {l-5) 

J-  Pe 

{P-S-T  +  R)}qc  +  {1-5){T-R)=0  (52) 


One  root  of  equation  (52)  is  easy  to  get  as  gc  =  0  when  5  =  1.  To  find  the  relationship  between  qc  and  5, 
we  check  the  existence  of  implicit  function  (F)  around  {qc,  <5)=(0,  1)  as 

=  {1  -  Pe)lP  -  R  -  -  R)].  (53) 


Since  pe  <  equation  (53)^^  0,  and  thus  the  Implicit  Function  Theorem  can  be  applied  around  5  =  1 

such  that 


dqc 

d5 


dF\ 

dS  l(gc,(5)=(0,l) 

OF  I 

dqc  I(9c,,5)=(0,1) 

T-R 


{1-P,)[P-R-^^{T-R)Y 


(54) 


From  the  assumptions,  we  know  that  equation  (54) <  0,  which  essentially  states  that  there  exists  a  value 
qc  G  (0, 1),  for  a  large  enough  5  such  that  gc  ^  0  as  5  ^  1.  Hence,  a  set  of  parameters  satisfying  the  system 
requirement  is  obtained  around  5  =  1. 

Further,  with  the  set  of  parameters,  the  average  payoff  is  updated  as 


Vc  =  lim  1(1  -  qc)R  +  gc>S'  +  [(1  -  qc)(P  -T)+qc(S-  P)]} 

qc^O  1  —  Pe 

=  R+ -^^(R-T)  >  P.  (55) 

1  -Pe 

Thus,  state  C  is  always  more  efficient  than  state  N.  In  addition,  when  the  nodes  are  updating  their  beliefs  on 
the  opponent,  it  will  always  assume  that  the  opponent  has  never  deviated  because  no  deviation  is  observable. 
The  consistency  requirement  is  satisfied  as  neither  node  tries  to  update  its  beliefs  about  others;  instead,  the 
nodes  play  the  best  response  strategies.  Hence,  we  have  proved  that  the  state  machine  based  forwarding 
approach  has  a  sequential  equilibrium  for  large  5,  when  pe  <  T  >  R.  □ 
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In  the  proof,  we  showed  that  with  the  system  parameters  (state  transition  probabilities)  in  [0,1],  qc 
can  be  arbitrarily  close  to  0  as  <5  goes  to  1;  and  the  cooperative  state  is  always  strictly  Pareto  superior 
to  the  non-cooperative  state.  Moreover,  the  average  payoff  of  the  cooperative  state  is  arbitrarily  close  to 

R-^St-r). 

By  further  manipulating  the  constraints  in  Theorem  3,  we  have  the  properties  as  follows. 

Corollary  4.  In  order  to  reach  sequential  equilibrium,  R  <  T  <  R^{R  —  P). 

Corollary  5.  In  a  sequential  equilibrium,  the  average  payoff'  of  the  cooperative  state  is  lower  bounded  by 

P  and  upper  bounded  by  R  —  {T  —  R). 

Corollaries  4  and  5  infer  that  the  values  of  the  elements  in  the  payoff  matrix  can  help  to  reach  the 
sequential  equilibrium,  and  at  the  same  time  pushing  the  average  payoff  to  the  Pareto  frontiers.  In  particular, 
we  can  find  a  small  enough  e  such  that  T  =  R  +  e  to  relax  the  constraint  on  channel  loss  in  Theorem  3. 

Corollary  6.  If  T  =  R  +  e,  when  e  O"'',  a  sequential  equilibrium  can  be  reached  regardless  of  the  noise 
in  the  channel,  and  the  average  payoff  of  the  cooperative  state  Vc  ^  R. 

Proof:  Since  T  =  ii  -|-  e,  in  Theorem  3,  in  order  to  reach  sequential  equilibrium 

Also,  limg^o+  '^tEp^  =  1-  Since  pe  G  (0,1),  for  e  ^  0^,  it  essentially  relaxes  the  constraint  on  pe;  thus 
Pe  can  take  any  value  in  (0,1).  From  Corollary  4,  T  <  R^{R  —  P),  which  derives  Hence, 

Vcie)  =  lim,^o+  R  -  =  R-  □ 

4.3  Multi-hop  Packet  Forwarding 

So  far,  we  have  discussed  the  two-player  case  of  packet  forwarding  when  the  observation  is  imperfect  and 
private.  However,  in  a  wireless  network,  packet  forwarding  from  source  to  destination  usually  requires 
multiple  hops.  In  this  section,  we  model  the  multi- hop  packet  forwarding  as  a  multi-player  packet  forwarding 
game.  We  investigate  the  interactions  among  the  players  and  analyze  the  cooperation  strategies  based  on 
two-player  approaches  discussed  in  Section  4.2. 

Before  we  study  the  cooperation  strategies  in  this  scenario,  we  need  to  characterize  a  multi-hop  wireless 
network  and  model  the  packet  forwarding  game,  in  which  multiple  nodes  participate  in  a  hop-by-hop  manner. 
We  consider  a  network  where  nodes  are  mobile  within  a  certain  area.  The  nodes  are  selfish  but  not  malicious. 
Each  of  nodes  can  be  a  source  of  a  data  session  which  generates  packets  and  sends  them  to  a  specific 
destination.  Thus,  the  multi-player  packet  forwarding  game  can  be  modeled  from  the  packet  forwarding 
game  under  heterogeneous  noise. 

Definition  22.  The  multi-player  packet  forwarding  game  is  a  series  of  packet  forwarding  games  (as  defined 
in  Definition  20)  T  =  (I,  A,  H,  u)  under  noise  where  for  each  data  session: 

•  I  =  1,2, ...,  n  denotes  a  set  of  nodes  that  are  candidates  to  form  a  packet  forwarding  route  from  source 
to  destination. 

•  A  =  {Forward,  Drop) . 

•  Q  is  the  space  of  the  observed  signals  and  it  is  private  to  the  node  itself.  No  private  observation 
exchange  is  assumed.  A  node  only  plays  with  its  immediate  neighbors  in  I,  and  cannot  obtain  any 
information  beyond  one  hop. 
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•  The  payoffs  are  defined  in  Table  5  for  per  unit  packet  forwarded.  A  data  session  consists  of  multiple 
unit  packets  and  thus  the  forwarding  game  is  repeated. 

4.3.1  Multi-hop  Packet  Forwarding  Strategy  Design 

Theorem  3  and  Corollary  6  state  that  it  is  possible  for  the  two-player  forwarding  game  to  reach  a  sequential 
equilibrium  where  cooperation  can  be  enforced.  Since  a  node  can  only  interact  with  its  one  hop  neighbor,  for 
a  route  from  source  node  to  the  destination,  it  is  natural  that  if  the  games  played  at  each  of  the  hops  reach 
sequential  equilibria  according  to  our  model,  all  the  nodes  are  cooperative  in  the  multi-hop  forwarding.  For 
each  data  session,  based  on  whether  there  is  a  dedicated  route,  the  multi-hop  packet  forwarding  strategy 
can  be  categorized  into  two  types:  routing  based  forwarding  and  hop-by-hop  forwarding. 

4.3.2  Routing  Based  Forwarding 

In  this  type  of  forwarding  strategy,  a  route  has  to  be  established  before  packet  forwarding  starts.  Thus,  a 
route  discovery  process  is  involved  and  the  source  node  knows  explicitly  the  intermediate  nodes  at  the  time 
of  route  establishment.  As  the  route  discovery  process  can  be  done  in  various  ways,  like  Ad  hoc  On-Demand 
Distance  Vector  Routing  (AODV)  [40]  and  Dynamic  Source  Routing  (DSR)  [19]  protocols,  a  route  selection 
mechanism  should  be  in  place  to  determine  which  nodes  are  chosen  to  form  the  route,  or  in  other  words,  who 
the  players  are  in  the  game.  The  criteria  in  route  selection  are  diversified.  Usually,  the  aim  of  the  routing 
protocol  and  the  type  of  application  determine  the  route  selection,  e.g.,  route  with  least  hops,  route  with 
least  traffic,  and  etc.  With  the  route  established,  the  packet  forwarding  games  can  then  be  played  along  the 
route.  For  any  node  possessing  a  packet,  it  plays  the  game  with  the  next  hop  node  on  the  route  following 
the  approaches  defined  in  Figure  4  with  C  as  the  initial  state.  If  for  some  reason,  the  next  hop  node  does 
not  participate  in  the  game  any  more,  the  route  is  broken  and  a  new  route  needs  to  be  set  up  and  the  games 
will  be  played  with  a  new  set  of  players.  The  routing  based  forwarding  strategy  can  be  listed  as  the  following 
steps: 

•  Step  1:  Routing  discovery  and  selection. 

•  Step  2:  Play  two-player  packet  forwarding  game  at  every  hop  along  the  route  for  each  of  the  data 
packet  to  be  delivered. 

•  Step  3:  Re-establish  a  new  route  if  the  original  route  is  down  and  repeat  the  process  from  Step  1. 

4.3.3  Hop-by-Hop  Forwarding 

In  the  situation  of  high  mobility,  route  changes  are  fast  and  it  is  costly  to  maintain  a  dedicated  route  from 
source  to  destination.  Thus,  it  is  possible  that  any  node  possessing  the  packet  determines  the  next  hop 
relay  node  as  long  as  the  packet  can  eventually  reach  the  destination  node.  A  typical  hop-by-hop  forwarding 
is  the  Greedy  Perimeter  Stateless  Routing  (GPSR)  [22]  in  which  every  hop  is  geographically  closer  to  the 
destination.  While  the  frequency  of  updating  the  next  hop  differs  from  one  application  to  another,  a  node 
can  possibly  have  several  next  hop  relaying  nodes  during  the  same  data  session.  From  the  game  theory 
perspective,  a  node  might  choose  to  play  the  same  forwarding  game  with  different  nodes  at  different  times. 
The  hop-by-hop  forwarding  strategy  simply  follows  two  steps. 

•  Step  1:  Choose  or  update  the  selection  of  next  hop  relay  node. 
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•  Step  2:  If  a  node  possesses  the  packet,  play  the  two-player  packet  forwarding  game  for  each  of  the 
data  packet  to  be  delivered. 

The  advantage  of  routing  based  forwarding  over  hop-by-hop  forwarding  is  that  the  source  nodes  have 
some  control  on  the  players  in  the  games.  This  might  be  important  because  the  source  node  can  specify 
which  nodes  it  wants  to  include  in  the  multi-player  packet  forwarding  game.  However,  the  extra  overhead 
on  route  discovery  and  maintenance  put  routing  based  forwarding  on  a  less  favorable  side  compared  with 
the  lightweight  hop-by-hop  forwarding. 

As  indicated  in  Section  4.2,  the  difficult  of  the  forwarding  game  lies  in  the  incapability  of  identifying  the 
state  the  opponent  node  is  in,  and  it  is  also  true  for  multi-hop  packet  forwarding  games.  However,  since  all 
nodes  are  selfish  and  only  interested  in  maximizing  their  own  payoffs,  individual  games  can  be  devised  so 
that  cooperation  strategy  means  optimal  payoffs  for  all  the  nodes  in  the  games.  We  simplify  the  multi- hop 
packet  forwarding  game  by  decomposing  it  to  multiple  two-player  packet  forwarding  games  between  a  node 
and  its  next  hop  relay  node.  The  following  theorem  shows  how  cooperation  is  enforced  in  multi-hop  packet 
forwarding. 

Theorem  4.  Multi-hop  packet  forwarding  strategies  (defined  above)  lead  to  a  sequential  equilibrium  for  the 
multi-player  paeket  forwarding  game. 

Proof:  First,  we  consider  the  hop- by- hop  forwarding  strategy.  For  each  sender  node  j  and  its  next  hop 
relay  node  k,  the  game  played  between  them  can  reach  sequential  equilibrium  as  suggested  in  Theorem  3. 
If  node  j  changes  its  criteria  in  selecting  the  next  hop,  and  choose  another  node,  the  new  node  pair  will  still 
play  the  two-player  packet  forwarding  game  and  hence  guarantees  a  sequential  equilibrium,  as  long  as  6  is 
close  to  I  and  the  system  parameters  are  set  accordingly.  Further,  when  we  consider  any  two  consecutive 
hops,  the  games  played  at  every  hop  generate  independent  equilibria  of  each  other.  Thus,  when  sequential 
equilibrium  is  achieved  at  every  hop,  the  hop- by- hop  forwarding  strategy  leads  to  sequential  equilibrium  for 
the  whole  multi-player  packet  forwarding  game. 

In  the  scenario  of  routing  based  forwarding  strategy,  when  a  route  is  selected,  the  players  and  games 
to  be  played  at  each  hop  are  fixed.  Since  sequential  equilibria  are  attainable  at  each  hop,  a  route  level 
sequential  equilibrium  on  multi-player  packet  forwarding  game  is  induced.  In  the  case  of  route  selection, 
if  no  payoff  metrics  are  involved  in  the  selection  decision,  reaching  the  equilibrium  solely  depends  on  the 
games  played  at  each  hop.  However,  if  the  route  with  maximum  payoff  is  selected  (e.g.,  VCG  [3]  and  its 
variant  protocols),  the  sender  node  has  no  incentive  to  switch  to  another  route  because  any  deviation  will 
incur  a  lower  payoff.  In  this  case,  the  sequential  equilibrium  can  be  achieved  on  the  route  with  maximized 
payoff. 

Moreover,  when  we  apply  Theorem  3  to  each  of  the  hops  of  two-player  forwarding  game,  consistency  is 
maintained  and  hence  the  same  for  the  multi-player  packet  forwarding  game.  Therefore,  multi-hop  packet 
forwarding  strategies  lead  to  a  sequential  equilibrium  for  the  multi-player  packet  forwarding  game. 

□ 

Furthermore,  the  corollaries  for  Theorem  3  can  be  extended  for  multi-player  packet  forwarding  game.  For 
example,  if  we  assume  a  forwarding  process  of  m  hops  with  Pe{m)  as  the  associated  channel  loss  probabilities, 
the  payoff  boundaries  for  the  forwarding  with  all  cooperative  node  are  [mP,  —  —  R)],  which 

is  also  strictly  bounded  by  mR. 
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5  Coexistence  with  Malicious  Nodes 


In  sections  3  and  4,  we  focused  on  how  to  enforce  the  selfish  nodes  to  cooperate.  However,  another  type 
of  misbehavior  in  wireless  network  exists,  viz,  malicious  attacking.  In  the  context  of  packet  forwarding, 
attacking  includes  altering  the  contents  of  the  packets,  denial-of-service  attacks,  malicious  packet  dropping, 
and  so  on.  Identification  and  isolation  of  malicious  nodes  in  a  distributed  system  is  a  challenging  problem. 
This  problem  is  even  more  aggravated  in  wireless  networks  because  the  unreliable  channel  makes  the  actions 
of  the  nodes  hidden  from  each  other.  Therefore,  a  regular  node  in  the  network  can  only  construct  a  belief 
about  a  malicious  nodes  through  monitoring  and  observation.  In  this  section,  we  use  game  theory  to  study 
the  interactions  between  regular  and  malicious  nodes  in  a  wireless  network.  In  Section  5.1,  we  model 
the  malicious  node  detection  process  as  a  Bayesian  game  with  imperfect  information  and  show  that  a 
mixed  strategy  perfect  Bayesian  Nash  Equilibrium  (also  a  sequential  equilibrium)  is  attainable.  While  the 
equilibrium  in  the  detection  game  ensures  the  identification  of  the  malicious  nodes,  we  argue  that  it  might 
not  be  profitable  to  isolate  the  malicious  nodes  upon  detection.  As  a  matter  of  fact,  malicious  nodes  can 
co-exist  with  regular  nodes  as  long  as  the  destruction  they  bring  is  less  than  the  contribution  they  make.  To 
show  how  we  can  utilize  the  malicious  nodes,  a  post-detection  game  between  the  malicious  and  regular  nodes 
is  formalized  in  Section  5.4.  Solution  to  this  game  shows  the  existence  of  a  subgame  perfect  Nash  Equilibrium 
and  the  conditions  that  achieve  the  equilibrium.  Further,  in  Section  5.5,  we  show  how  a  malicious  node  can 
construct  a  nested  belief  about  the  belief  held  by  a  regular  node.  By  employing  the  nested  belief  system, 
a  Markov  Perfect  Bayes-Nash  Equilibrium  is  reached  and  the  equilibrium  postpones  the  detection  of  the 
malicious  node. 

5.1  Malicious  Nodes  Detection  Game 

We  consider  a  wireless  network  consisting  of  Regular  and  Malicious  nodes.  By  regular  node  we  mean  that  a 
node  that  works  towards  the  common  goal  of  the  network.  Also,  it  is  rational  and  its  actions  are  governed 
by  an  underlying  utility  function.  On  the  other  hand,  a  malicious  node  aims  to  hamper,  disturb,  and  even 
attack  the  network.  Although  the  actions  of  a  malicious  node  is  also  determined  by  certain  utility  functions, 
such  functions  are  designed  to  bring  damages  to  the  network. 

Despite  the  two  types  of  nodes,  the  identity  (type)  of  a  malicious  node  is  not  directly  revealed  to  others. 
Instead,  the  types  can  only  be  estimated  or  conjectured  through  observing  actions.  To  identify  the  attacks 
and  malicious  nodes  in  the  network,  a  regular  node  can  monitor  the  actions  of  others.  However,  such 
monitoring  is  costly  (e.g.,  consumes  the  receivers’  own  resource)  and  a  node  cannot  afford  to  monitor  all 
the  time.  Moreover,  the  observations  might  not  be  accurate  because  of  the  noise,  e.g.,  wireless  channel  loss. 
Thus,  the  regular  nodes  do  not  monitor  the  network  all  the  time  and  during  those  times,  attacks  cannot  be 
identified. 

To  simplify  the  analysis,  our  research  focuses  on  the  packet  forwarding  process.  We  assume  that  node 
i,  or  the  sender  node,  has  a  packet  to  send  to  node  j,  or  the  receiver  node.  If  the  sender  node  is  regular, 
it  only  takes  the  action  Forward" .  If  the  sender  node  is  malicious,  it  can  choose  to  “Attack"  with  a  risk 
of  being  identified  or  “Forward"  (not  attack)  to  disguise.  We  further  assume  that  time  is  divided  into  slots 
and  nodes  take  their  actions  within  each  slot. 
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5.2  Detection  Game  Model 


To  abstract  the  interactions  among  the  nodes,  we  consider  a  two-player  game  played  by  the  sender  node  i 
and  the  receiver  node  j.  The  types  of  these  nodes,  9i  and  9j,  are  private  information.  Since  the  type  of  each 
player  is  hidden,  and  the  observation  is  not  accurate,  it  is  a  Bayesian  game  with  imperfect  information  [38]. 

To  model  the  process  of  detecting  the  malicious  nodes  in  the  network,  we  apply  a  special  category  of 
Bayesian  game  called  the  signaling  game.  A  signaling  game  is  played  between  a  sender  and  a  receiver.  The 
sender  has  a  certain  type  and  a  set  A4  of  available  messages  to  be  sent.  Based  on  its  knowledge  on  its  own 
type,  the  sender  chooses  a  message  from  A4  and  sends  it  to  the  receiver.  However,  the  receiver  does  not 
know  the  type  of  the  sender  and  can  only  observe  the  message  but  not  the  type.  Through  observation,  the 
receiver  then  takes  an  action  in  response  to  the  message  it  observed.  In  the  malicious  node  detection  game, 
the  sender,  node  i  can  be  either  regular  =  0  or  malicious  9i  =  1.  The  receiver,  node  j  is  always  a  regular 
node,  i.e.,  9j  =  0. 

The  action  profiles  a*  available  to  node  i  are  based  on  its  type.  For  9i  =  0,  a*  =  {Forward}.  For  9i  =  1, 
ai  =  {Attack,  Forward}.  The  receiver  node  j  has  the  option  to  monitor  if  node  i  is  attacking  or  not,  thus 
aj  =  {Monitor,  Idle}. 

To  further  construct  the  game,  we  define  the  following  values.  Let  gA  be  the  payoff  of  a  malicious  node 
if  it  successfully  attacks.  The  cost  associated  with  such  an  attack  is  ca-  For  the  receiver  node  j,  the  cost  of 
monitoring  is  cm  and  0  if  it  is  idle.  Hence,  for  the  action  profile  (a*,  aj)  =  {Attack,  Idle),  the  net  utility 
for  a  successful  attacking  node  i  is  gA  —  ca,  the  loss  for  node  j  is  —gA  due  to  the  attack.  Similarly,  if  the 
action  profile  is  (a*,  Oj)  =  {Attack,  Monitor),  the  attacking  malicious  node  i  losses  gA  +  ca,  and  the  net 
gain  for  node  j  is  gA  —  cm-  However,  if  a  malicious  node  chooses  not  to  attack,  the  cost  to  forward  a  packet 
is  cp,  which  is  the  same  cost  to  a  regular  sender  node.  Based  on  the  types  of  node  i  and  node  j,  the  payoffs 
matrices  are  presented  in  Table  8.  For  quick  reference,  the  notations  used  in  this  section  are  tabulated  in 
Table  7. 

In  addition,  in  our  model,  we  introduce  Pe  as  the  channel  loss  rate.  The  channel  unreliability  implies 
that  monitoring  can  be  accurate  with  probability  1  —  Pe.  We  also  denote  7  as  the  attack  success  rate. 

5.3  Equilibrium  Analysis  for  the  Stage  Game 

We  begin  our  analysis  on  the  malicious  node  detection  game  from  the  extensive  form  of  the  static  Bayesian 
game  as  illustrated  in  Figure  5.  We  consider  the  type  determination  of  node  i  when  9i  =  1  happens  with 
probability  (j).  To  solve  this  game,  we  are  interested  in  finding  the  possible  Bayesian  Nash  Equilibrium 
(BNE).  In  a  static  Bayesian  game,  the  BNE  is  the  Nash  Equilibrium  given  the  beliefs  of  both  nodes.  In  our 
case,  node  i  knows  for  sure  that  for  node  j,  9j  =  0,  however,  node  j’s  belief  about  node  i  is  that  9i  =  1  with 
probability  (j). 

First,  let  us  consider  pure  strategies  only.  Based  on  9i,  the  pure  strategies  available  for  node  i  are 
Gi  =  {{Attack  if  9i  =  1, Forward  if  9i  =  0),  Forward  V  9i}.  For  node  j,  the  strategy  set  is  aj  = 
{Monitor,  Idle}.  To  find  the  BNE,  we  let  ai  and  aj  play  with  each  other  and  derive  the  conditions  under 
which  neither  node  can  increase  its  utility  by  unilaterally  changing  its  strategy. 

Lemma  10.  In  the  malicious  node  detection  game,  there  is  a  malice  belief  threshold  (/>o,  such  that  no  pure 
strategy  BNE  exists  if  f  >  (pQ. 

Proof:  We  start  by  eliminating  a  trivial  pure  strategy  pair  {Forward  V  9i  Monitor).  Erom  Table  88(a),  we 
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Node  i 

(potential)  malicious  node,  attacker 

Node  j 

regular  node,  monitor 

9^ 

type  of  node  i,  1  for  malicious,  0  for  regular 

Ui ,  Uj 

payoff  of  node  i  or  j  in  the  stage  game 

gA 

gain  of  successfully  attack  for  node  i 

CA 

cost  of  any  attack  for  node  i 

CF 

cost  of  forward  (not  attack)  for  node  i 

CM 

cost  of  monitoring  for  node  j 

</> 

the  belief  of  node  i  being  malicious  in  stage  game 

7 

attack  success  rate  for  node  i 

Pe 

channel  error  rate 

a 

false  alarm  rate  for  node  j 

CLi  5  dj 

action  profile  for  node  i  or  j 

di{t) 

node  i’s  action  observed  by  node  j  in  stage  t 

Oi 

node  i’s  observation  of  its  payoffs 

Pji^i) 

belief  node  j  holds  about  node  i’s  type  in  the  dynamic  game 

Pi{Pj{Gi)) 

belief  node  i  holds  about  node  j’s  belief  in  the  dynamic  game 
(used  to  derive  the  MPBNE) 

(Ji^  (Tj 

node  i  or  j’s  strategy  profile 

p,  q 

random  variable  of  probability  node  i  attacks  or  node  j  monitors  in  the 
malicious  node  detection  game 

p*,  q* 

random  variable  of  probability  node  i  attacks  or  node  j  monitors  in  the 
post-detection  game 

p,  q 

random  variable  of  probability  node  i  attacks  or  node  j  monitors  in  the 
malicious  node  detection  game  with  node  i’s  nested  belief  model 

Ci 

coexistence  index 

* 

node  j’s  belief  about  node  i’s  type  when  node  i  uses  nested  belief  model 

Pj*i 

node  j’s  belief  about  node  i’s  attack  probability  when  node  i  uses 
nested  belief  model 

Pn 

the  value  of  node  i’s  attack  probability  in  MPBNE 

Ppbe 

the  value  of  node  i’s  attack  probability  in  PBE 

PSPNE 

the  value  of  node  i’s  attack  probability  in  SPNE 

Table  7:  Notations  used  in  malicious  nodes  coexistence  analysis. 


know  that  for  both  nodes,  they  can  improve  their  payoffs  by  deviating  from  the  strategy  pair.  We  further 
analyze  the  following  two  cases. 

Case  1:  ai  =  {Attack  if  6i  =  1,  Forward  if  9i  =  0).  For  node  j,  if  aj  =  Monitor,  the  expected  payoff 
is 


Uj{Monitor)  =  {qa  -  cm)(I>{1  -  Pe)  -  (t>Pe[l{gA  + cm)  +  {'^  -  i)cm]  -  {'^  -  (t>)cM  (56) 

where  each  term  represents  monitoring  the  attack  successfully,  failing  to  monitor  the  attack,  and  node  i  is 
regular  respectively.  If  aj  =  Idle,  the  expected  payoff  is 

Uj{Idle)  =  -(jrygA  (57) 

If  (57)>(56),  the  dominant  strategy  for  node  j  is  Idle.  Correspondingly,  for  node  i,  the  best  response  would 
be  {Attack  if  6i  =  1,  Forward  if  6i  =  0).  Thus  {ui,  aj)  =  {{Attack  if  9i  =  1,  Forward  if  9i  =  0),  Idle} 
is  a  BNE  under  the  condition  that  ({)  <  .  If  (57)  <(56),  or  (/>  >  ,  the  dominant 

strategy  for  node  j  is  Monitor,  however,  the  best  response  to  Monitor  for  node  i  is  Forward  V  9i.  Hence 
{ai,  aj)  =  {{Attack  if  9i  =  1,  Forward  if  9i  =  0),  Monitor}  is  not  a  BNE  under  the  condition  that 
^  _ 
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Node  j 


Nature 


Figure  5:  Stage  malicious  node  detection  game  tree. 
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Case  2:  (cTj,  aj)  =  {{Forward  V  9i,  Idle}.  If  node  j  chooses  not  to  monitor,  the  best  response  for  node 
i  is  to  Attack  if  =  1.  This  will  lead  to  the  previous  case  when  cf)  <  (i_p  •  Therefore,  there  is  no 

BNE  if  (cji,  cjj)  =  {{Forward  V  9i,  Idle}. 

To  sum  up,  the  pure  strategy  BNE  exists  if  and  only  if  cj)  <  equilibrium  strategy  profile 

is  (cjj,  (Tj)  =  {{Attack  if  9i  =  1,  Forward  if  9i  =  0),  Idle}.  In  other  words,  we  can  find  (po  =  ; 

such  that  no  pure  strategy  BNE  exists  if  (/>  >  0o-  □ 

Although  pure  strategy  BNE  exists,  it  is  not  practical  because  the  equilibrium  requires  node  j  to  be  Idle 
at  all  times,  and  hence  the  malicious  nodes  cannot  be  detected.  It  is  also  called  Pooling  Equilibrium  [38]  in 
which  the  receiver  has  no  clue  about  sender’s  type.  Therefore,  it  is  desirable  to  seek  a  mixed-strategy  BNE, 
and  obviously,  such  BNE  exists  when  cp  >  cpQ. 

Let  us  denote  p  as  the  probability  with  which  node  i  of  type  9i  =  1  plays  Attack  and  q  as  the  probability 
with  which  node  j  plays  Monitor.  To  find  the  mixed  strategy  BNE  of  this  game,  we  need  to  find  the  values 
of  p  and  q  such  that  neither  node  i  nor  j  can  increase  payoff  by  altering  the  actions.  Eor  the  mixed  strategy 
played  by  node  i,  the  payoff  of  node  j  playing  Monitor  is 

Uj{Monitor)  =  (pp['y{gA  -  cm){1  -  Pe)  +  {I  - 'y){l  -  Pe){gA  -  cm) 

-(I  -  'y)peCM  -  We{gA  +  Cm)]  -  (1  -  P)(pCM  -  (1  -  (p)cM 
=  (pP[gA  -  gAPe{'^  +  7)1  -  CM-  (58) 

If  node  j  plays  Idle, 

Uj{Idle)  =  -p-/(pgA- 

Thus,  in  the  mixed  BNE  strategy,  Uj  {Monitor)  =  Uj{Idle).  Thus  p 
node  j  plays  the  mixed  strategy,  the  payoff  of  node  i  playing  Attack  is 

Ui{Attack)  =  -{gA  +  CA){l-Pe)q  +  {gA-CA)l{l-q) 

+{9A  -  CA)iqPe  -  Cyl(l  -  l)Peq  “  ca{1  “  ?)(!  “  7) 

=  gS'ACPe  -  1)(1 +  7) +  5^7  -  CA-  (60) 

When  node  i  plays  Forward, 

Ui{Forward)  =  —cp.  (61) 

Hence,  to  obtain  q,  Ui{Attack)  =  Ui{Forward),  and  q  =  • 

To  sum  up  the  analysis,  we  state  the  following  lemma. 

Lemma  11.  The  malieious  node  detection  game  has  a  mixed  strategy  BNE  when  ai,  aj  =  {{Attack  with  ^ 

1,  Forward  if  9i  =  0),  Monitor  with  9'^'ccn  (p  >  (po- 

Lemmas  10  and  11  provide  us  with  the  conditions  under  which  BNE  can  be  attained.  One  of  the 
conditions  is  the  belief  of  malice  threshold  (pQ.  As  suggested  in  Lemma  10,  this  threshold  is  related  to  the 
channel  reliability  (1  —  Pe),  attack  success  rate  (7)  and  detection  gain  {gA/cM)-  In  the  pure  strategy  BNE, 
node  i  always  attacks  and  the  belief  of  node  j  on  node  i’s  malice  is  very  low  since  the  detection  gain  is 
usually  very  large  as  Pe,7  G  [0,1]-  However,  when  the  belief  grows  and  eventually  exceeds  the  threshold, 
the  mixed  strategy  BNE  requires  node  i  to  be  less  aggressive  in  attacking.  In  other  words,  the  equilibrium 
implies  node  i  should  know  about  node  j’s  belief  when  making  the  decision.  When  node  j  is  absolutely  sure 
about  node  i’s  type,  node  i’s  equilibrium  attack  probability  drops  to  the  value  of  the  belief  threshold. 


(59) 

=  Similarly,  when 
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5.3.1  Belief  Update  and  Dynamic  Bayesian  Games 


So  far,  the  analysis  on  the  malicious  node  detection  stage  game  has  shown  that  the  equilibrium  is  associated 
with  node  j’s  belief  on  node  i’s  type.  However,  the  difficulty  lies  in  the  assignment  of  the  belief  as  a 
priori  information  available  to  node  j.  Thus,  it  is  desirable  that  this  belief  can  be  accurately  presented  and 
dynamically  updated.  We  apply  dynamic  Bayesian  game  theory  to  discuss  how  the  belief  is  updated. 

We  assume  that  the  static  malicious  node  detection  game  is  repeatedly  played  at  every  time  slot,  and  we 
consider  the  infinite  repeated  game  without  discounting  (i.e.,  payoffs  in  every  stage/slot  have  equal  weight). 
In  addition  to  the  notation  dehned  in  the  stage  game,  we  introduce  n^j\0i  =  9i)  as  the  belief  node  j  holds 
about  6i  =  9i.  Since  node  j  is  always  a  regular  node,  lJ^f\9j  =  0)  for  all  t  >  0.  We  further  define  ai{t)  as 
the  action  node  i  plays  at  stage.  Node  j  may  monitor  node  i’s  actions  through  the  observed  signal  di{t). 
The  reasons  for  the  discrepancy  between  ai{t)  and  di{t)  are  the  observation  error  caused  by  the  channel 
unreliability  and  the  false  alarm  rate  {a)  caused  by  the  inaccuracy  and  limitation  in  the  detection  of  node 

j- 

Based  on  Bayes’  theorem,  we  construct  our  belief  update  rule.  If  node  j  is  continuously  monitoring,  its 
belief  on  9i  can  be  calculated  with  the  belief  it  holds  at  the  immediate  previous  stage  and  the  actions  it 
observed.  We  write  the  belief  at  the  {t  +  1)*^  stage  as: 

0-1-1  "i  {9i)P{di(t)\9i) 

(.+i)(o.)  ^  ;  'I  -y  _  ,82) 

where  0  is  the  space  of  all  possible  values  9i  can  take;  in  our  case  0  =  {0, 1}. 

For  each  of  the  terms  in  (62),  we  have  the  following  equations. 


P{di{t)  =  Attack\9i  =  1)  =  p{l  —  pe)  +  {1  —  p)a  (63) 

P{di{t)  =  Attack\9i  =  0)  =  a  (64) 

P{di{t)  =  Forward\9i  =  1)  =  ppe  +  (1  —  p)(l  —  a)  (65) 

P{di{t)  =  Forward\9i  =  0)  =  \  —  a.  (66) 


Since  node  j  does  not  monitor  node  i’s  actions  at  every  stage,  when  node  j  is  not  monitoring,  its  belief 
remains  the  same  at  the  next  stage.  Thus,  (62)  is  revised  as: 


[9i)  -  q 


pf\9^)P{di{tM 


+  (1  - 


(67) 


The  concept  of  belief  system  is  hence  introduced  to  describe  the  aforementioned  belief  building  and 
updating  process.  A  belief  system  is  a  function  that  assigns  each  information  set^  a  probability  distribution 
over  the  histories  in  that  information  set  [38] .  Although  in  our  discussions  above,  we  did  not  explicitly  state 
how  history  is  accounted  for  in  (62)  and  (67),  it  is  easy  to  observe  that  every  updated  belief  is  determined  by 
the  actions  node  j  observes  in  the  current  stage  and  the  belief  it  holds.  The  beliefs  are  further  determined  by 
the  actions  in  the  previous  stages  and  it  can  be  backtracked  to  the  initial  belief  and  the  subsequent  actions. 
Thus,  the  current  belief  and  observed  action  can  fully  represent  the  histories  in  the  information  sets,  and 
those  information  sets  can  be  reached  with  positive  probabilities  if  the  strategies  are  carefully  designed. 

^Recall  in  Definition  8,  an  information  set  is  a  set  of  all  the  possible  moves  that  could  have  taken  place  in  the  game  so  far, 
for  a  particular  player,  given  what  that  player  has  observed.  In  an  imperfect  information  game,  an  information  set  contains  all 
possible  states  in  the  history,  e.g.,  in  Figure  5,  the  dotted  lines  show  the  information  set  available  to  node  j. 
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With  the  belief  system,  the  games  are  played  in  a  sequential  manner.  As  the  game  evolves,  neither 
nodes  can  stick  to  the  very  same  strategy  at  every  stage  to  yield  the  most  payoffs.  Thus,  the  best  response 
strategies  are  dependent  on  the  current  beliefs  held  by  the  nodes.  Perfect  Bayesian  Equilibrium  (PBE)  can 
be  applied  to  characterize  the  aforementioned  dependency.  In  PBE,  the  belief  system  is  updated  by  Bayes’ 
rule.  PBE  also  demands  that  the  optimality  of  subsequent  play  given  the  belief.  Next,  we  show  how  to 
construct  a  PBE  in  the  dynamic  malicious  node  detection  game. 

We  first  show  the  existence  of  a  mixed  strategy  equilibrium  and  then  argue  the  infeasibility  of  the  pure 
strategy  equilibrium.  Consider  at  an  arbitrary  stage  k  of  the  game;  we  denote  as  the  probability 
node  i  of  type  6i  =  1  plays  Attack,  as  the  probability  node  j  plays  Monitor.  In  the  equilibrium, 
uf^\Attack)  =  uf^\Forward)  and  onitor)  =  u\’^\ldle).  In  particular, 

uf\af^  =  Attack\af^  =  Monitor)  =  -{gA  +  ca)(1  -  Pe)q^^^  +  {gA  -  ca)i{1  - 


+  {9A  -  CA)iq^^he  -  ca(1  -  l)Peq^^^  -  ca(1  -  q^^^){l  -  j)  (68) 

=  Forward\a^p  =  Monitor)  =  —cp.  (69) 

=  Monitor\a\^^  =  Attack)  =  pf\0i  =  l)p^^^['y{gA  -  cm)(1  -  Pe) 

+  (1  -  7)(1  -Pe){gA  -  cm)  -  (1  -  l)PeCM  “  lPe{gA  +  Cm)] 

-(1  -  p^^^)iif\ei  =  1)CM  -  pf\9r  =  0)CM  (70) 

uf\af'^  =  I  dle\al’'^  =  Attack)  = -p^^^gA'ypf\9i  =  1).  (71) 


The  solutions  to  the  above  equations  are 


pik) 

qik) 


CM 


pf\9i  =  1)5a(1  +  7)(1  -Pe) 
QAl  -  CA  +  Cp 

gA{l  -Pe)(l  +7)’ 


(72) 

(73) 


What  and  q^^'^  suggest  is  an  equilibrium  profile  This  profile  shows  the  sequential 

rationality  [16,  38],  that  is,  each  node’s  strategy  is  optimal  whenever  it  has  to  move,  given  its  belief  and  the 
other  node’s  strategy.  In  other  words,  for  any  alternative  strategies  and  cr'j^\ 


u. 


Ua 


{k)i(F)  (k). 


F)((Ak)  (k). 


.F), 


j 


j 


((W  7  W  )\9i,aiit),p)  A9i))  >u)  T(cj)  \  a>  ’)\ei,di{t),  p)  A^i)) 


(74) 

(75) 


Besides  sequential  rationality,  a  PBE  also  demands  that  the  belief  system  satisfies  the  Bayesian  conditions 
[16]. 

Definition  23.  ([16],  p331-332)  The  Bayesian  conditions  defined  for  PBE  are 

B(i):  Posterior  beliefs  are  independent.  For  history  h^^\pi{6i\9i,h^^'^)  =  //*(%lhW). 

B(ii):  Bayes’  rule  is  used  to  update  beliefs  whenever  possible. 

B(iii):  Nodes  do  not  signal  what  they  do  not  know. 

B(iv):  Posterior  beliefs  are  consistent  for  all  nodes  with  a  common  joint  distribution  on  6  given  h^*\ 


Our  proposed  belief  system  satisfies  the  Bayesian  conditions.  B(i)  is  satisfied  because  6j  =  0  all  the  time. 
Equation  (62)  is  derived  from  Bayes’  rule,  and  hence  B(ii)  is  also  satisfied.  B(iii)  is  fulfilled  because  node  i’s 
signal  is  determined  by  its  action  and  if  ai{k)  =  dfik),  pj{9i\ai{k),h^^'^)  =  pj{9i\di{k),  h^^^).  B(iv)  is  trivial 
in  our  game  because  no  third  player  exists. 
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The  analysis  on  Bayesian  conditions  and  sequential  rationality  serves  as  the  proof  of  the  following  theo¬ 
rem. 

Theorem  5.  The  dynamic  malicious  node  detection  game  has  a  perfect  Bayesian  equilibrium  that  can  he 
attained  with  strategy  profile 

Remark  1:  The  infeasibility  of  pure  strategy  PBE  is  proved  as  follows:  If  node  i  attacks,  the  best 
response  for  node  j  is  Monitor,  which  makes  node  i  non-profit  able  to  play  Attack.  If  node  i  plays  Forward, 
piF  =  0,  the  best  response  for  node  j  is  Idle  (i.e.,  q^^^  =  0).  However,  the  sequential  rationality  requires 
q^^'^  >  ,  which  leads  to  a  contradiction.  Therefore,  no  pure  strategy  PBE  exists  in  the  dynamic 

malicious  node  detection  game.  It  is  noted  that  the  infeasibility  of  the  pure  strategy  PBE  in  the  dynamic 
settings  should  not  be  confused  with  the  existence  of  a  pure  strategy  BNE  in  a  static  game  because  the  pure 
strategy  BNE  in  a  static  game  is  always  an  artifact. 

Remark  2:  The  proved  PBE  can  be  further  refined  to  Sequential  Equilibrium  [24].  In  the  sequential 
equilibrium,  the  Bayesian  conditions  are  extended  as  belief  sensibility  and  consistency.  The  belief  sensibility 
requires  the  information  sets  can  be  reached  with  positive  probabilities  (/i)  given  the  strategy  profile  a.  The 
consistency  demands  an  assessment  (it,  /x)  should  be  a  limit  point  of  a  sequence  of  the  mixed  strategies  and 
associated  sensible  beliefs,  i.e.,  (u, /x)  =  lim„^oo(c"', /^"')-  In  our  game,  belief  sensibility  is  satisfied  because 
our  proposed  belief  system  updates  the  beliefs  according  to  Bayes’  rule  and  it  assigns  a  positive  probability 
to  each  of  the  information  set.  Theorem  8.2  in  [16]  states  that  in  incomplete  information  multi-stage  games, 
if  neither  player  has  more  than  2  types,  Bayesian  condition  is  equivalent  to  belief  consistency  requirement. 
In  our  game,  6i  =  0, 1,  9j  =  0,  and  hence  consistency  is  fulfilled.  Together  with  the  sequential  rationality, 
the  PBE  in  our  game  is  also  a  sequential  equilibrium.  Since  every  finite  extensive-form  game  has  at  least 
one  sequential  equilibrium,  which  is  a  refinement  to  PBE,  it  also  implies  the  existence  of  PBE  in  our  game. 

5.4  Post-detection  Game  and  Coexistence 

In  the  previous  section,  we  have  discussed  how  to  update  node  j's  belief  system  based  on  Bayes’  rule.  It  is 
natural  that  through  observation,  although  imperfect  at  every  stage  game,  node  j  can  accumulate  a  better 
estimation  about  0*.  Eventually,  after  repeated  monitoring,  there  will  be  a  stage  at  which  node  j  can  predict 
with  confidence  whether  node  i  is  regular  or  malicious. 

5.4.1  Post-detection  Game  Model 

Traditionally  speaking,  after  node  j  has  identified  node  i  as  a  malicious  node,  it  will  try  to  report  and  isolate 
node  i  immediately  to  prevent  future  attacks.  However,  there  are  also  situations  where  “isolation”  may  not 
be  a  good  choice.  Let  us  consider  a  wireless  network  which  operates  on  a  limited  resource  budget.  In  order 
to  prolong  the  lifetime  of  the  network,  every  regular  node  has  to  be  economical  towards  packet  forwarding. 
Hence,  if  a  malicious  node  can  be  used  to  handle  some  of  the  traffic,  it  is  beneficial  not  to  isolate  it. 

However,  there  is  a  trade-off  between  how  much  benefit  a  malicious  node  can  bring  and  what  damage  it 
can  do.  We  denote  np  and  ua  as  the  number  of  successful  forwarding  actions  and  number  of  attacks  taken 
by  a  malicious  node.  Recall  the  cost  of  forwarding  is  cp  and  the  loss  due  to  an  attack  to  the  network  is 
gA-  Thus,  for  a  regular  node,  if  it  observes  that  the  total  saving  due  to  forwarding  (npcp)  a  malicious  node 
contributes  is  greater  than  the  total  cost  due  to  its  attack  (nAPA),  then  keeping  that  node  in  the  network  is 
profitable. 
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To  further  analyze  the  conditions  under  which  a  malicious  node  can  be  kept  and  coexist  with  the  regular 
ones,  we  formally  define  the  post  detection  game.  The  game  has  two  players;  node  i  and  node  j,  both  nodes 
know  the  types  of  their  opponent,  i.e.,  node  j  knows  that  node  i  is  malicious  but  has  not  taken  any  action 
to  isolate  it.  Thus,  6i  =  1,  6j  =  0.  The  actions  available  for  node  i  is  at  =  {Attack,  Forward},  while  the 
actions  for  node  j  is  aj  =  {Monitor,  Idle}.  When  node  j  monitors,  it  keeps  a  record  of  what  node  i  has 
done  since  the  beginning  of  the  game.  It  also  calculates  a  coexistence  index  Ci  =  fipcp  —  fiAgA  for  node 
i,  where  hi?  is  the  observed  number  of  forwarding  actions  and  fiA  is  the  observed  number  of  attacks.  If  Ci 
falls  under  a  certain  threshold  r,  node  j  will  isolate  node  i  and  terminate  the  post-detection  game  because 
keeping  node  i  is  no  longer  beneficial.  If  Ci  >  r,  the  game  will  be  played  in  a  repeated  manner.  The  payoff 
matrix  for  the  post-detection  game  is  the  same  as  the  detection  game  for  0*  =  1  as  was  shown  in  Table  88(a). 


5.4.2  Searching  for  a  Coexistence  Equilibrinm 

Let  us  explore  the  strategies  that  both  nodes  can  take  to  reach  the  equilibrium  of  coexistence.  To  avoid 
confusion,  we  denote  p*  and  q*  as  the  probability  node  i  plays  Attack  and  node  j  plays  Monitor  respectively. 
It  is  noted  that  these  probabilities  are  different  from  the  ones  we  obtained  in  Section  5.3.1. 

We  first  derive  the  Nash  Equilibrium  using  indifference  conditions.  Suppose  the  post-detection  game  is 
played  at  repetition,  i.e.,  subgame  t.  The  expected  payoff  for  player  j  playing  Monitor  is 

uf  {Monitor)  =  {p*[-i{gA  -  cm)(1  -  Pe)  +  (1  -  7)(1  -Pe){gA  -  cm)  -  (1  -  l)PeCM 
-  IPeigA  +  Cm)]}  Pr(Ci  >t)  +  {gA  -  cm)p*{1  -  Pe)  Fr{Ci  <  r)  -  (1  -  p*)cm 
=  [gA{l  -pe+lPe)  -  CM]p*Fl{Ci  >  t)  +  {gA  -  Cm)P*(1  -Pe)Fx{Ci  <  r)  -  (1  -p*)cM- 

(76) 


If  node  j  plays  Idle,  the  expected  payoff  is  always 

uf\ldle)  =  -p*'ygA- 

Thus,  the  indifference  condition  require  {Monitor)  =  u'p {Idle),  and  hence  p*  is  obtained  as: 

*  CM 

P  ~  -  • 

[gA{l  -Pe  +  IPe)  -  Cm]  Pr(Cj  >  r)  {gA  -  cm)(1  -  Pe)  Pr(Ci  <  t)  +  cm  +  igA 

Similarly,  we  can  apply  the  indifference  condition  to  node  i  as: 


(77) 


(78) 


u'P  {Attack) 


{Forward) 


q*{-{gA  +  ca){1  -pe)Pr(Ci  <  r) 

+(1  -Pe)[{gA  -  ca)7  -  ca{1  -7)]Pr(Ci  >  r) 

+  {gA  -  CA)jPe  -  Ca{1  -  l)pe}  “  (1  “  9*)[ca(1  “  7)  “  {dA  “  Ca)7] 
g*{-{gA  +  ca)(1  -7'e)Pr(Ci  <  r)  -h  (c/a7  -  ca)[(1  -  Pe)Fv{Ci  >  r)  +Pe]} 

+(1  -  ^*)(5'A7  -  ca).  (79) 

-cp.  (80) 


Therefore,  q*  can  be  expressed  as: 


*  ^ _ CA  -  gA7  -  CF _ 

^  -{gA  +  ca)(1  -Pe)  Pr(Ci  <t)  +  {gAj  -  ca)(1  -pe){Pr{Ci  >  r)  -  1) 

The  problem  is  then  reduced  to  obtaining  the  probability  distribution  of  Ci.  Let  us  assume  at  the 
beginning  of  the  post-detection  game  Ci  =  cq  >  r.  For  the  sake  of  discussion,  we  also  assume  that  node  j  is 
constantly  monitoring.  Hence,  if  we  consider  I  subgames,  in  each  of  the  subgame,  Ci  is  updated. 
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We  denote  a  random  variable  y  =  Ci  =  co  +  hpcp  —  fiAdA-  Since  the  mixed  strategy  profile  requires  node 
i  to  choose  Attack  with  probability  p*,  hp  and  ha  are  binomially  distributed  as: 


Pr(h,.  =  Np)  =  cf^[{l-p*){l-pe)fn^  -  {I  -  P^-i- -  Pe)t^^  (82) 

Pr(hA  =  Na)  =  cf^\p*{l-Pe)fn'^-P*{^-Pe)f~^^  (83) 


Since  y  =  co  +  hpcp  -  fiAQA  =  cq  +  fipcp  -  {I  -  hp)gA  =  {cp  +  gA)np  -  IgA  +  cq  and  I,  cp,  gA,  cq  are 
constants,  to  get  the  distribution  of  y,  we  first  get  the  distribution  oi  w  =  y  +  IgA  —  cq. 

We  use  the  probability  generation  function  (pgf).  For  discrete  random  variable  x,  its  pgf  is  defined  as 

OO 

Gx{z)  =  E[z^]  =  Y,z^Pt{X  =  x)  (84) 

The  pgf  for  y  is 


Gw{z)  =  E[z^]  = 
i 

=  z-C'f^[(l-p*)(l-p,)]-/[l-(l-p*)(l-p,)f-%}F-+^^^) 

hf=0 

=  {(1  -p*)(l  -p,)  +  [1  - 
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To  relax  the  assumption  of  node  j’s  constant  monitoring,  the  current  stage  t  for  the  analysis  is  \t  =  l/q*~\  ■ 
Therefore,  we  have  obtained  the  equilibrium  strategy  parameter  p*  and  q*  for  every  subgame. 

So  far,  we  have  shown  that  for  the  mixed  strategy  profile,  attaining  a  Nash  Equilibrium  is  feasible.  As 
a  matter  of  fact,  every  game  has  a  mixed  strategy  Nash  Equilibrium.  To  further  refine  the  equilibrium,  we 
apply  the  One-Shot  Deviation  Property  to  derive  the  condition  for  subgame  perfect  Nash  Equilibrium. 

We  take  node  j  as  an  example  and  assume  the  repeated  game  has  no  discount.  In  our  previous  equilibrium 
analysis  using  the  indifference  condition,  we  have  proved  that  deviation  from  p*  or  q*  will  not  increase  the 
payoffs.  Hence,  in  the  following  derivation,  we  show  the  deviation  strategy  is  related  to  G. 

From  (76)  and  (77),  we  can  express  the  expected  payoff  for  node  j  as; 

T 

Uj  =  g*{[gA(l  -Pe  +  IPe)  -  cm\p*  Pr(Ci  >  r) 
t=0 

+{gA  -  cm)p*(1  -pe)Pr(Ci  <  r)  -  (1  -p*)cm}  -  (1  -  q*)p*igA-  (89) 


Suppose  node  j  deviates  at  rth  stage  and  r  <T.  The  deviation  can  be  either  of  the  following  two  cases. 
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Case  1 :  Isolate  node  i  while  Q  >  r.  In  this  case,  if  node  j  attacks  and  is  successfully  observed,  it  will 
be  isolated.  The  expected  payoff  at  this  stage  for  node  j  is 


^j4ev,l  =  -  Pe){gA  -  cm)  “  P*lPe{gA  +  Cm) 

-\p*{l-j)pe  +  {l-p*)]cM}  -  (1  -  q*)p*-fgA}Pr{Ci  >  t) 


(90) 


Case  2:  Keep  node  i  while  Ci  <  r.  Since  node  j  only  deviates  one  stage,  node  i  will  be  isolated  in  the 
next  stage.  The  expected  payoff  for  node  j  at  this  stage  is  the  same  as  above  expect  for  the  last  probability 
term. 


^j4ev,2  =  -  Pe){gA  “  Cm)  “  P*lPe{gA  +  Cm) 

-\p*{l--i)pe  +  {l-p*)]cM}  -  (I  -g*)p*75'yi}Pr(Ci  <  r) 


In  this  way,  the  total  expected  payoff  for  node  j  under  deviation  is 


r— 1 


jC) 


t=0  t=r+l 

OSDP  require  Uj^dev  <  Uj.  After  algebraic  manipulation,  we  have 

gA7{q*Pe  + 1)  +  q*Pe{icM  + 1  -  7)  >  (1  -  g*)igA  +  q*[igA Pr(Ci  <  r)  +  peCM  Pr(Cj  >  r)] 


(91) 


(92) 


(93) 


or 

gAl\pe  +  1  -  Pr(Ci  <  r)]  >  pe[cM  Pr(Ci  >  r)  +  7  -  1  -  70^].  (94) 

To  sum  up,  for  the  equilibrium  on  the  post-detection  game,  we  state  the  following  theorem. 

Theorem  6.  The  post- detection  game  has  a  mixed  strategy  Nash  Equilibrium  when  node  i  attaeks  with  p* 
and  node  j  monitors  with  q*.  This  strategy  is  also  subgame  perfect  if  gA'ylPe  +  ^  —  ^^iCi  <  t)]  >  Pe[cM  Tr{Ci  > 
r)  -I-  7  -  1  -  7Cm]. 


5.4.3  Convergence  of  the  Coexistence  Equilibrium 

The  post-detection  game  described  above  ends  when  Ci  <  r.  Since  Pr(Cj  <  r)  >  0,  the  game  is  of  finite 
stages.  In  this  subsection,  we  try  to  derive  the  expected  length  (number  of  stages)  of  the  game. 

We  focus  on  the  random  variable  Ci.  As  we  mentioned  earlier,  Ci  =  co  +  hpcp  —  UAgA-  Again,  we  assume 
node  j  is  constantly  monitoring.  After  one  stage  game,  the  probability  of  hiT’  =  hi?  -|- 1  is  (1  —p*){l  —pe),  and 
the  probability  of  hyi  =  h/i  +  1  is  p*{l  —Pe)-  Thus,  we  model  the  evolution  of  Ci  as  a  random  process  similar 
to  a  1-dimensional  random  walk,  where  the  value  of  Ci  increases  by  Ci?  with  probability  {1  —  p*){l  —  Pe) ,  and 
decreases  by  gA  with  probability  p*(l  —  Pe)-  The  1  —  pe  term  comes  from  the  unreliability  of  the  channel. 
To  obtain  the  expected  length  of  the  post-detection  game,  it  is  equivalent  to  calculating  the  expected  first 
hitting  time  of  the  random  process  with  the  absorbing  boundary  Ci  =  r. 


Theorem  7.  The  expeet  length  of  the  post- detection  game  is 
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Figure  6:  Realizations  of  the  random  walk. 


Proof:  Let  ??  be  a  random  variable  representing  the  first  hitting  time.  We  assume  that  time  is  divided  into 
slots  and  each  slot  represent  a  stage  game.  It  is  easy  to  see  that  rj  =  hp  +  fiA-  At  every  slot,  the  random 
process  has  2  possible  evolution  directions,  i.e.,  hp  +  1  or  fiA  + 1-  Therefore,  for  rj  slots,  there  are  2^  possible 
realizations. 

We  try  to  calculate  how  many  paths  hit  the  boundary  exactly  on  the  rjih  slot.  The  following  notations 
are  made.  Let  m  =  s  =  {cq  —  t) / cp  and  m,  s  are  integers.  In  Figure  6,  we  interpret  how  to  move  on  a 
grid  according  to  a  random  process.  Consider  a  random  walk  from  the  left  bottom  point,  li  hp  increases, 
move  one  block  right.  If  tla  increases,  move  m  blocks  up.  While  each  block  is  a  squarelet  with  the  length  cp, 
the  width  of  the  grid  is  hpcp,  the  height  is  qa^a,  and  diagonal  line  represents  Ci  =  r.  Each  walk  consists  of 
rj  moves  and  must  end  on  or  beyond  the  upper  rightmost  corner.  What  we  are  interested  in  is  the  nnmber 
of  monotonic  paths  that  wholly  falls  under  the  diagonal  line,  becanse  each  of  those  paths  is  a  realization  of 
the  random  process  which  hits  the  bonndary  for  the  first  time  at  the  ryth  slot. 

While  counting  the  number  of  realizations  under  the  diagonal  line  might  be  difficult,  we  calculate  the 
realizations  that  do  cross  the  line.  Let  the  number  of  realizations  crossing  the  line  be  M,  the  number  of 
realizations  under  the  line  is  then  —  M,  where  is  the  total  number  of  possible  realizations  on  the 
grid.  Consider  a  sample  realization  crossing  the  line  as  shown  in  Figure  6.  Let  d  be  the  number  of  horizontal 
steps  taken  in  the  path  before  hitting  the  diagonal  line.  To  hit  the  line,  at  least  vertical  steps  shonld 
be  taken,  covering  a  total  height  of  {d  +  s)cp.  The  total  number  of  such  paths  is  j-  After  hitting 

m  ' 

the  line,  the  rest  of  the  path  should  consist  of  hp  —  d  vertical  steps  and  the  total  nnmber  of  moves  left  is 
r?  —  —  d.  So,  the  total  number  of  paths  that  cross  the  diagonal  line  is  M  =  ^  j- 

TTL  * — 'tt  _r [_^  ^ IT ^ 
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realizations  hit  the  diagonal  line  for 
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To  sum  up,  out  of  2^  realizations,  ^  C's+d  s+d 


i+d  n-^-d 

the  hrst  time  at  the  ryth  move.  The  probability  of  game  length  being  ry  is  then 
Finally,  we  can  express  the  expected  length  of  the  post  detection  game  as 
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5.5  Countermeasures  for  the  Malicious  Node 


In  our  discussions  so  far,  we  haven  shown  that  it  is  feasible  to  design  strategies  in  order  to  achieve  the 
proposed  PBE  in  the  malicious  node  detection  game.  However,  there  are  still  some  issues  that  must  be 
resolved  before  the  equilibrium  strategies  can  be  applied  and  followed  by  practitioners.  These  issues  can  be 
categorized  into  two  aspects.  First,  the  PBE  requires  the  malicious  node  perfectly  know  the  belief  held  by 
the  regular  node.  However,  in  practice,  the  belief  information  is  never  shared.  Second,  the  malicious  node 
may  not  remain  passive  in  the  detection  game;  instead,  it  can  also  form  its  belief  about  the  current  status 
in  the  game  and  adjust  its  strategy  accordingly. 

It  is  natural  that  not  only  the  regular  node  but  also  the  malicious  node  (node  i)  study  the  game  through 
observation.  In  particular,  node  i  understands  that  although  the  unreliable  channel  makes  the  observations 
inaccurate,  the  more  often  it  attacks,  the  quicker  node  j  can  form  a  correct  belief  about  its  malicious  type. 
Thus,  node  j  should  take  different  strategies  when  different  beliefs  are  held  by  node  j.  These  strategies 
(e.g.,  the  PBE  strategy  in  equation  (72))  are  Markovian  when  we  view  the  beliefs  as  a  set  of  states.  The 
Markovian  strategies  adopted  by  node  i  is  only  determined  by  the  current  state  of  the  belief,  i.e.,  when  the 
belief  update  process  takes  place.  However,  the  belief  held  by  node  j  is  its  private  information,  and  by  no 
means  can  node  i  access  this  information.  Therefore,  it  is  essential  for  node  i  to  construct  its  own  belief 
system,  which  is  the  belief  on  the  belief  node  j  holds  towards  node  i  and  we  call  this  belief  developed  by 
node  i  nested  belief. 

We  denote  fii{iJ,j{6i))  as  the  belief  node  i  holds  about  node  j’s  belief  about  node  i,  i.e.,  iXi{pLj{9i))  is 
the  nested  belief  about  For  the  game  we  presented  in  Table  8(a),  depending  on  the  actions  nodes  i 

and  j  take,  the  payoff  of  node  i,  Ui,  can  be  one  of  the  three  different  values;  —gA  —  ca,  Qa  —  (^A  or  —cp- 
While  the  observations  of  the  payoffs  are  node  i’s  private  information,  given  a  specific  observation  o*,  node 
i  can  predict  the  actions  taken  by  node  j,  despite  the  prediction  may  be  inaccurate.  For  example,  when 
Oi  =  —QA  —  CA,  node  i  knows  for  sure  aj  =  Monitor.  However,  when  Oi  =  —cp,  node  i  cannot  tell  what 
node  i  has  done.  Further,  based  on  the  prediction  of  the  actions  node  j  takes,  node  i  can  update  its  belief 
on  how  node  j's  belief  fijiOi)  has  changed  due  to  aj.  Continuing  with  the  same  examples,  when 
Oi  =  —QA  —  CA,  CLj  =  Monitor,  so  node  j  observes  the  Attack  launched  by  node  i  and  it  will  update  lij{9i) 
according  to  equation  62.  Similarly,  when  Oi  =  qa  —  ca,  node  i  knows  that  node  j  is  idle  and  iaj{9i)  will  not 
change.  However,  the  uncertainty  comes  when  o*  =  —cp,  where  node  i  cannot  accurately  update  its  belief 
about  tij{9i). 

To  construct  the  belief  update  system  for  node  i,  we  employ  the  Bayes’  Theorem.  At  stage  t  of  the  game, 
based  on  the  observation  of\  node  Ps  belief  g,i{9i)  is  updated  as: 


W| 


(96) 


where  0={O,I}. 

The  conditional  probabilities  of  observing  o*  given  its  type  9i  can  be  calculated  as  follows.  To  distinguish 
from  the  strategy  profiles  we  used  previously,  we  denote  p  as  the  probability  node  i  launches  attacks,  and  q 
as  the  probability  node  j  monitors.  Therefore,  the  probabilities  that  arise  due  to  the  different  observations 
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and  node  i’s  type  are: 


P{of  ^  =  -gA  -  CA\9i  =  1)  =  pq{l  -  Pe)  +  (1  -  p)qoi  (97) 

P(of^  =  -gA  -  CA\0i  =  Q)  =  qa  (98) 

P{of  =  gA-  CA\0i  =  1)  =  Mve  +  (1  -  ?)]  (99) 

P(of  =5A-CA|0i  =  0)  =  0  (100) 

P{of  =  -CF\e^  =  1)  =  {l-p)[q{l  -  a)  +  (1  -  q)]  (101) 

P(of  =  -CF\e,  =  0)  =  (1  -  a)q.  (102) 


With  the  above  equations,  for  each  of  the  observations  Oi  £  O,  where  O  =  {—gA  —  ca,  gA  —  ca,  —  cf}, 
is  updated  independently.  Since  for  the  malicious  node  i,  its  type  =  1  is  known  to  itself,  the 
overall  belief  is  hence  updated  considering  each  of  the  possible  observations. 
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Further,  with  the  belief  system  of  node  i,  the  malicious  node  detection  game  can  be  solved  again  to 
obtain  the  sequential  rationality.  The  derivation  process  is  similar  to  what  we  have  presented  in  equations 
(68)  to  (71),  with  the  exception  that  p[^\pj{0i))  will  be  considered.  The  equilibrium  strategy  profiles  that 
reaches  sequential  rationality  are. 


piP  ^  _ bM _ 

pf\pj{9i  =  l))ffA(l  +  7)(1  -  Pe) 
~{k)  _  9a1  —  CA  +  Cf 

^  5(a(1  -Pe)(l +  7)' 


(104) 

(105) 


Moreover,  it  is  easy  to  justify  that  the  belief  update  process  for  node  i  also  satisfies  the  Bayesian 
condition  in  Dehnition  23.  In  addition,  equation  (104)  suggests  that  node  Fs  strategy  is  purely  dependent 
on  the  current  belief  it  holds.  Thus,  we  can  further  refine  the  PBE  in  malicious  detection  game. 


Theorem  8.  With  the  nested  belief  system  for  node  i,  the  dynamic  malicious  node  detection  game  has  a 
Markov  Perfect  Bayes-Nash  Equilibrium  (MPBNE)  when  the  strategy  profiles  are  )=(p^^\ 

It  is  noted  that  the  PBE  obtained  in  Theorem  5  is  also  a  MPBNE,  however,  the  strategy  profile  has 
limited  applicability  because  the  equilibrium  profile  for  node  i  requires  the  knowledge  of  node  j’s  state 
(belief).  On  the  contrary,  the  profiles  in  Theorem  8  only  rely  on  the  private  information  available  to  the 
nodes  themselves. 

A  special  case  for  the  strategy  profile  o'*  is  “Always  attack  when  p^\yLj{9i  =  1))  <  fl  and  forward 
otherwise,  for  a  predehned  threshold  p,  g(0,  1)”.  In  this  strategy,  when  iiP{gj{9i  =  1))  <  p,  p  =  1  and 
node  j  will  progressively  update  its  belief  when  it  monitors  because  node  i  is  always  behaving  maliciously. 
However,  when  the  belief  threshold  is  reached,  node  i  will  refrain  from  launching  attacks,  and  hence  its 
payoff  will  decrease.  It  is  clear  that  the  strategy  deviates  from  the  MPBNE  because  p  does  not  adhere  to 
the  equilibrium.  As  a  result,  node  i  will  be  identified  quickly  and  it  will  be  dormant  for  the  rest  of  the  time. 
While  this  strategy  is  favorable  to  node  j  and  the  network,  from  node  i’s  perspective,  this  strategy  will  limit 
its  attacks  and  hence  it  is  not  desirable. 
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6  Experimental  Study 

In  this  section,  we  discuss  the  experimental  studies  that  we  conduct  to  validate  the  theoretic  models  we 
developed.  Our  experimental  findings  are  presented  in  the  same  order  as  the  analysis,  i.e.,  cooperation 
in  the  homogeneous  channel,  cooperation  in  the  heterogeneous  channel,  and  coexistence  with  malicious 
attackers.  In  particular,  Section  6.1  shows  the  games  played  by  a  cooperation  enforcement  strategy  and  a 
collusion  strategy  with  homogeneous  noise.  In  Section  6.2,  we  setup  a  multi-hop  wireless  network  scenario 
with  heterogeneous  channel  conditions  and  study  the  performance  of  our  proposed  state  machine  based 
cooperation  enforcement  schemes.  Finally,  in  Section  6.3,  we  illustrate  attacker  detection  process  and  show 
the  network  properties  when  the  equilibria  are  reached. 

6.1  Anti-collusion  Game 

To  illustrate  how  the  dynamics  of  the  population  evolves  and  how  collusion  can  be  resisted  in  the  packet 
forwarding,  we  represent  our  findings  through  simulation  which  is  based  on  the  framework  proposed  in  [35]. 
In  the  simulation,  a  total  number  of  100  nodes  adopting  two  different  strategies  are  considered.  Homogeneous 
noise  is  added  to  simulate  the  channel  unreliability.  We  take  the  average  population  share  over  five  simulation 
runs  and  plot  how  the  population  evolves  as  the  games  are  played.  The  strategies  adopted  by  the  nodes  are 
CORE  and  a  naive  collusion  strategy  (CS)  defined  as  follows: 

Definition  24.  Naive  Collusion  Strategy  (CS):  Forward  all  packets  from  the  colluding  group,  discard  all 
packets  from  outside  of  the  group,  i.e., 

_  J  I  if  node  —i  is  also  a  colluding  node. 

I  0  otherwise 


Figure  7:  The  effect  of  initial  population  share. 


In  Figure  7,  we  show  how  the  population  evolves  with  different  initial  population  share.  The  plots  are 
obtained  with  p^  =  0.2,  a  =  1.1.  For  CORE,  we  consider  a  history  of  last  10  steps  {b  =  10).  With  the 
parameter  settings,  Theorem  1  infers  that  CORE  is  SPNE.  It  is  very  clear  from  the  plots  that  the  population 
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adopting  CORE  overtakes  that  adopting  CS  and  the  games  eventually  converge  to  a  point  where  all  the 
population  adheres  to  CORE,  i.e.,  all  the  nodes  are  cooperative.  It  is  also  suggested  that  a  larger  initial 
cooperative  population  =  0.7)  leads  to  a  faster  convergence  of  the  population  evolution. 


Eigure  8:  The  effect  of  history  length  of  CORE. 


We  further  study  how  the  reputation  history  length  b  affects  the  performance  of  CORE.  In  Eigure  8,  we 
compare  the  population  evolution  of  three  different  values  of  b.  While  the  rest  of  the  parameter  settings 
remain  the  same,  we  only  focus  on  the  population  dynamics  of  CORE.  Although  the  results  show  that 
cooperation  can  be  enforced  with  CORE,  the  convergence  rates  are  different.  The  comparison  states  that 
large  values  of  b  help  fast  convergence.  This  is  due  to  the  longer  history  it  takes  into  account,  the  less 
observation  error  it  is  likely  to  make. 


Figure  9:  The  effect  of  channel  unreliability. 
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Figure  9  presents  the  effect  of  the  channel  unreliability  on  the  population  dynamics.  The  plots  clearly 
imply  that  the  more  reliable  the  channel  is,  the  faster  cooperation  can  be  enforced. 

Figure  10  shows  the  combined  effect  of  incentive  value  a  and  the  initial  population  share.  The  first 
observation  is  that  with  a  small  a,  the  population  converges  faster  than  that  of  a  larger  a.  This  is  because 
when  a  is  large,  the  more  benefit  colluding  can  get  and  hence  the  evolution  takes  a  longer  time.  Another 
interesting  observation  is  that  with  the  same  a  value,  very  small  initial  population  share  =  0.2)  fails  to 
reach  the  +1  population  state  (i.e.,  cooperation).  The  reason  behind  is  the  effect  of  SPNE.  In  Theorem  2,  we 
have  shown  that  cooperation  can  only  be  obtained  if  the  cooperative  strategy  is  ESS  or  the  initial  population 
reaches  a  threshold.  In  case  of  Eigure  10,  the  SPNE  condition  of  CORE  requires  a  <  =  1.25,  so  when 

a  =  2,  CORE  is  no  longer  an  ESS.  Thus,  a  small  initial  population  share  will  not  be  able  to  lead  the  entire 
population  to  cooperation. 


Eigure  10:  SPNE  and  the  population  dynamics. 


6.2  Multihop  Cooperation  Enforcement 

In  this  section,  we  evaluate  our  state  machine  based  cooperation  enforcement  packet  forwarding  strategies 
by  simulation.  We  also  show  the  network  performance  under  the  proposed  strategies. 

We  consider  50  nodes  that  are  randomly  scattered  in  an  area  of  1000m  x  1000m.  The  physical  com¬ 
munication  range  is  set  to  be  250m.  During  the  simulation,  log-distance  path  loss  with  exponent  of  3  is 
adopted  as  the  propagation  model,  and  IEEE  802.11  is  the  underlying  MAC  protocol  with  a  bandwidth 
of  2  Mbps.  The  data  packet  size  is  64  bytes  carried  by  Constant  Bit  Rate  (CBR)  type  of  traffic  with  2 
packet  per  second,  unless  specially  mentioned.  We  allow  only  one  data  session  at  a  time.  The  data  sessions 
originate  and  terminate  at  randomly  selected  source  and  destination  nodes.  Eor  the  routing  based  packet 
forwarding  strategy,  a  DSR  like  routing  agent  is  in  place  to  handle  the  route  discovery  and  maintenance. 
Eor  the  hop-by-hop  forwarding,  GPSR  like  forwarding  is  employed  and  it  finds  the  reachable  node  with 
minimum  distance  to  the  destination  as  the  next  hop.  The  source  nodes  have  equal  probability  in  selecting 
the  forwarding  type. 
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To  simulate  the  repeated  nature  of  the  packet  forwarding  games,  any  node  pair  engaged  in  packet 
forwarding,  play  a  number  of  games  with  respect  to  the  discount  factor  5  defined  as  a  system  parameter. 
For  a  given  5,  the  average  number  of  subgames  is  1/(1  —  5).  Therefore,  in  our  simulation,  a  data  session  has 
at  least  1/(1  —  5)  packets  with  one  packet  forwarding  as  a  game. 

The  simulation  runs  for  1000  seconds  with  different  channel  loss  probabilities,  pe  is  set  to  a  random 
number  in  [0.01,  0.2]  as  the  default  value. 

Our  investigation  starts  with  the  one  hop  packet  forwarding  (i.e.,  two-player  packet  forwarding  game). 
We  set  the  game  payoff  matrix  as  T  =  0.8,  R  =  0.7,  P  =  0.1  and  6  =  0.99.  Figure  11  shows  the  average 
payoff  for  each  of  the  nodes  using  our  state  machine  based  forwarding  strategy.  For  comparison,  we  plot 
the  payoff  for  “Full  Cooperation”  strategy  as  well.  Full  cooperation  implies  node  will  always  forward  others 
packet  unconditionally.  The  theoretical  boundaries  for  our  proposed  strategy  are  also  presented.  The  plot 
shows  that  the  payoffs  of  the  proposed  strategy  are  within  the  theoretical  limits  developed  in  Corollary  5. 
Also,  it  is  observed  that  the  payoffs  are  very  close  to  the  unconditional  “Full  Cooperation”  strategy.  The 
average  payoffs  are  much  closer  to  the  upper  bound  than  the  lower  bound  because  when  the  games  reach 
sequential  equilibria,  mutual  cooperation  is  enforced. 


0  10  20  30  40  50 

Node  id 


Figure  11:  Average  node  payoff  for  state  machine  based  forwarding  strategy. 

Figure  12  presents  the  average  node  payoffs  with  different  channel  loss  probabilities  Pe  and  discount 
factor  6.  Note  that  5  =  0.999  implies  1000  subgames  played  while  S  =  0.99  implies  100  subgames.  The 
plots  provide  two  insights.  (1)  As  the  channel  becomes  more  unreliable,  the  average  payoff  drops.  (2)  The 
more  games  played,  the  more  average  payoff  is  generated.  These  observations  also  suggest  that  it  is  more 
desirable  to  have  more  packets  in  one  continuous  data  session  before  switching  for  another  relaying  nodes. 

We  show  the  equilibrium  nature  of  the  proposed  state  machine  based  forwarding  strategy  in  Figure  13. 
The  payoffs  of  deviation  strategies  are  plotted.  In  the  deviation  strategies,  when  the  node  in  state  C,  it 
always  plays  Discard  with  probability  qc  =  0.1  or  qc  =  0.15  (Recall  that  in  our  equilibrium  strategy,  qc  is 
a  very  small  value.).  In  this  setting,  6  =  0.999  and  Pe  =  0.01.  Figure  13  clearly  shows  that  the  payoffs  with 
our  proposed  strategy  are  strictly  greater  that  the  deviation  strategies. 

To  further  evaluate  our  proposed  strategies,  we  consider  the  network  performance.  In  Figure  14,  we 
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Figure  12:  Average  node  payoff  with  different  channel  loss  probability  and  discount  factor. 


present  the  normalized  network  throughput  at  5  =  0.99.  We  denote  1  as  the  state  that  all  the  generated 
packets  are  successfully  delivered  from  source  to  destination.  It  is  shown  that  with  a  small  channel  loss 
probability  {pe  =  0.01),  our  proposed  Multi-hop  Packet  Forwarding  Strategy  (MHPFS)  reaches  almost  the 
same  throughput  as  the  fully  cooperative  strategy.  With  a  larger  pe,  the  throughput  difference  between 
MHPFS  and  the  unconditional  cooperation  case  is  larger. 
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Figure  13:  Average  node  payoff  comparison  with  deviation  strategies. 
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In  addition,  we  analyze  the  effects  of  hop  count  and  channel  unreliability  on  the  throughput.  The  results 
are  shown  in  Figure  15  with  6  =  0.999.  It  can  be  noted  that  throughput  drops  when  channel  becomes  more 
unreliable  or  hop  count  increases.  Also,  our  proposed  MHPFS  yields  throughput  performance  very  close  to 
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Figure  14:  Normalized  data  session  throughput  for  different  channel  loss  probability  and  strategies. 


the  situation  where  all  the  nodes  are  unconditionally  cooperative. 


Figure  15;  Normalized  data  session  throughput  vs.  hop  count  and  channel  loss  probability. 

The  relationship  of  packet  generation  rate  and  throughput  is  presented  in  Figure  16.  6  is  set  to  0.99.  The 
plots  do  not  show  much  difference  for  different  packet  rates  and  the  throughput  remains  almost  constant. 

Last  but  not  least,  we  study  the  effect  of  mobility.  In  this  setting,  pe  =  0.01,  S  =  0.999.  The  mobility 
profile  we  use  is  Random  WayPoint  (RWP)  with  5  seconds  of  pause  in  between  two  consecutive  moves. 
Figure  17  plots  the  throughput  performance  for  two  different  speeds.  The  results  suggest  that  mobility 
introduces  link  break  probability  and  decreases  the  throughput  for  our  proposed  forwarding  strategy. 
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Figure  16:  Normalized  data  session  throughput  for  different  packet  generation  rates. 

6.3  Attacker  Detection  and  Coexistence 

In  this  section,  we  study  the  properties  of  the  perfect  Bayesian  Nash  equilibrium  in  the  malicious  node 
detection  game  and  the  post-detection  subgame  perfect  Nash  equilibrium  derived  in  section  5  through 
simulations.  In  our  simulator,  two  players  play  the  games  repeatedly;  the  payoffs  and  strategy  profiles  for 
each  of  the  subgames  are  recorded  to  analyze  the  properties  of  the  equilibria. 

6.3.1  Malicious  Node  Detection  Game 

We  first  present  the  simulation  results  on  the  malicious  node  detection  game.  In  Figure  18,  we  show  how 
the  monitoring  probability  in  PBE  strategy  increases  with  the  malicious  node  attack  success  rate.  The  plots 
infer  that  the  equilibrium  require  node  j  to  increase  its  monitoring  frequency  as  the  attack  success  rate 
increases.  Also,  as  the  channel  becomes  more  unreliable,  node  j  must  play  Monitor  more  frequently.. 

Figure  19  compares  the  convergence  of  node  j’s  belief  system  when  different  attack  gains  are  presented. 
The  plots  are  shown  with  =  0.01,  7  =  0.95  and  a  =  0.01.  In  Figure  19(a),  we  show  how  the  belief 
system  forms  a  correct  belief  on  the  type  of  node  i  when  only  Attack  is  observed.  The  convergence  of  the 
belief  system  under  PBE  is  illustrated  in  Figure  19(b).  The  plots  suggest  that  the  lower  the  attack  gain 
is,  the  quicker  the  belief  system  converges.  This  property  can  be  explained  as  follows.  A  smaller  attack 
gain  requires  node  i  to  attack  more  often  in  order  to  get  more  payoff,  and  increasing  the  attack  frequency 
also  increases  the  risk  of  being  successfully  observed.  With  more  observations,  the  belief  is  updated  more 
frequently  and  accurately.  Belief  system  converges  slower  in  Figure  19(b)  than  in  Figure  19(a)  because  in 
the  PBE,  instead  of  constantly  monitoring,  node  j  only  monitors  with  probability  q. 

A  more  complete  study  on  the  convergence  of  the  belief  system  is  shown  in  Figure  20.  Plots  in  Figure  20(a) 
indicate  the  larger  the  disguise  cost  cpjcA  is,  the  less  time  it  takes  to  converge.  This  is  because,  with  a 
larger  disguise  cost,  it  is  unprofitable  for  node  i  to  disguise  by  forwarding  packets.  Instead,  it  will  launch 
more  attacks,  thus  increasing  the  chances  to  be  identified.  Figure  20(b)  shows  a  quicker  converged  belief 
system  for  a  smaller  detection  gain.  Figures  20(c)  and  20(d)  relate  the  convergence  with  less  errors  and 
uncertainties  in  the  system.  As  expected,  with  errors  and  uncertainties  (i.e.,  low  channel  loss,  high  attack 
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Figure  17:  Normalized  data  session  throughput  for  mobile  nodes. 


success  rate  and  low  false  alarm  rate),  the  belief  system  converges  quickly. 

Finally,  the  parameters  affecting  the  PBE  attack  probability  p  are  investigated  in  Figure  21.  The  attack 
gain  is  a  very  important  factor  in  determining  the  value  of  p  as  shown  in  Figure  21(a).  A  large  attack 
gain  means  more  payoff  gained  from  an  attack,  which  implies  less  number  of  attacks  are  needed.  Hence  p 
should  be  smaller.  Figures  21(b)  and  21(c)  indicate  that  node  i  should  attack  less  frequently  under  a  reliable 
channel  as  every  attack  is  more  likely  to  be  successful.  However,  as  suggested  in  Figure  20(d),  if  the  false 
alarm  rate  is  high  for  the  regular  node,  the  malicious  node  can  take  advantage  of  it  and  attack  more  often. 

6.3.2  Post-detection  Game 

After  the  belief  system  of  node  j  converges  {6i  >  0.99),  we  can  safely  conclude  that  node  j  has  detected  the 
malicious  node.  Therefore,  the  post-detection  game  starts.  To  show  the  continuity,  at  the  beginning  of  the 
post  detection  game,  node  i  sticks  to  its  PBE  strategy. 

Figure  22  presents  how  the  attack  probability  p*  evolves  to  the  SPNE  strategy  from  the  PBE.  It  is  clear 
in  the  plots  that  in  the  SPNE,  node  i  should  decrease  its  attack  probability  to  avoid  isolation.  Eigure  22(a) 
shows  a  larger  detection  gain  that  corresponds  to  a  smaller  attack  rate;  thus  in  the  equilibrium,  the  payoffs 
for  node  j  will  not  increase  due  to  the  large  detection  gain.  Figure  22(b)  states  that  if  the  channel  is  lossy, 
node  i  should  attack  more  often.  The  reason  behind  this  claim  is  that  the  more  unreliable  the  channel  is, 
the  less  probable  node  j  can  accurately  observe  an  attack.  Plots  in  Figure  22(c)  are  obtained  from  detection 
gain  equals  to  5.  This  figure  shows  that  the  equilibrium  is  not  sensitive  to  the  initial  value  and  threshold  of 
the  coexistence  index  Cj. 

The  expected  length  of  the  post-detection  game  is  shown  in  Figure  23.  First,  the  figure  states  that  the  less 
errors  (i.e.,  less  channel  loss  and  more  successful  attack)  in  the  system,  the  longer  the  post-detection  games 
can  be  played.  Second,  the  length  of  the  game  grows  with  the  attack  gain.  This  interesting  phenomena  can 
be  explained  in  the  following  way.  The  larger  attack  gain  enables  the  malicious  node  to  attack  less  while 
keeping  its  payoff  high.  Thus,  more  often,  the  malicious  node  will  play  as  a  regular  node  to  avoid  isolation. 
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Figure  18: 


Equilibrium  strategy  q  vs.  the  attack  success  rate  in  malicious  node  detection  game. 


This  will  increase  the  time  for  the  regular  and  malicious  nodes  to  coexist.  This  property  can  be  used  to 
extend  the  lifetime  of  the  network. 

Last  but  not  least,  we  show  how  the  network  throughput  can  beneht  from  coexistence.  Similar  obser¬ 
vations  can  be  made  as  the  game  length  property.  With  a  larger  attack  gain,  the  malicious  node  decreases 
its  attack  rate  and  does  more  packet  forwarding  as  a  regular  node.  Therefore,  the  malicious  nodes  can  be 
utilized  to  increase  the  throughput  more  often  as  the  attack  gain  grows.  The  throughput  gain  property 
illustrates  clearly  that  malicious  and  regular  nodes  can  coexist,  and  the  coexistence  equilibria  improve  the 
throughput  of  the  network. 

6.3.3  Characteristics  of  MPBNE 

We  study  the  characteristics  of  the  Markov  Perfect  Bayes-Nash  Equilibrium.  In  particular,  we  are  interested 
in  the  properties  of  node  z’s  belief  update  system  (i.e.,  belief  about  belief)  and  how  the  introduction  of  node 
i’s  belief  would  affect  the  results  we  obtained  in  Section  6.3.1. 

In  Eigure  25,  we  study  node  z’s  belief  system  in  the  MPBNE.  The  plots  are  obtained  with  pe  =  0.01, 
7  =  0.95  and  a  =  0.05.  To  better  show  the  properties  of  node  i’s  belief  system  in  the  MPBNE,  we  also 
present  node  j’s  belief  system.  In  particular,  we  plot  fii  as  node  i’s  belief  system  in  MPBNE  according 
to  equation  103,  fij  as  node  y’s  belief  system  in  PBE  as  stated  in  equation  (67)  and  fi*  as  node  j’s  belief 
system  update  in  the  MPBNE  as  a  result  of  node  i’s  actions  in  the  belief  about  belief  model.  A  common 
observation  is  that  node  i’s  belief  //j  converges  much  faster  than  the  belief  in  PBE,  which  means  that 
node  i  holds  a  false  belief  that  node  j  can  identify  its  malice  quicker  than  node  j  actually  could.  As  a  result 
of  the  inaccuracy  in  node  i’s  belief,  it  takes  longer  time  for  node  j  to  form  a  belief  on  node  i.  This  is  evident 
from  the  plots  that  show  fj,j  converges  much  slowly  than  it  does  in  PBE,  when  node  i  does  not  employ  any 
belief  system. 

In  addition,  Eigure  25  shows  some  similar  properties  of  node  i’s  belief  system  to  what  we  have  observed 
in  Eigure  20.  Eor  example,  Eigure  25(b)  indicates  a  larger  detection  gain  will  force  node  i’s  belief  system 
converge  quicker.  Eigures  25(c)  and  25(d)  infer  that  reliable  channel,  high  attack  success  rate  and  accurate 
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(a)  constant  monitoring 


(b)  PBE 


Figure  19:  Belief  system  update  with  different  attack  gains. 
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(c)  channel  unreliability  and  attack  success  rate 


(d)  false  alarm  rate 

Figure  20;  Effects  of  parameters  on  belief  system  update  (cont.) 
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Figure  22:  Effects  of  parameters  on  the  SPNE  strategy  (cont.) 


Eigure  23:  Expected  length  of  the  post-detection  game. 
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Figure  24;  Throughput  gain. 

detection  (low  false  alarm  rate)  will  also  induce  a  fast  convergence  of  jjLi.  However,  the  only  discrepancy  is 
with  the  disguise  cost;  for  node  a  high  disguise  cost  makes  update  of  slow,  while  for  node  j,  a  high 
disguise  cost  helps  converge  faster.  The  reason  lies  in  the  inaccuracy  of  node  i’s  belief  system.  From  our 
previous  discussion,  it  is  stated  that  when  the  observed  payoff  is  —cp-,  node  i  cannot  predict  what  node  j’s 
action  is.  Thus,  an  internal  error  resides  in  node  i’s  belief  system,  and  this  error  is  amplified  when  cp  is 
large  (i.e.,  cp,  takes  a  high  weight  in  the  payoff),  which  corresponds  to  a  large  disguise  cost. 

The  properties  of  the  MPBNE  strategy  are  further  investigated  in  Figure  26.  Once  again,  similarity  is 
found  between  Figures  26(a)  and  21(a),  as  well  as  Figure  26(b)  and  Figures  21(b),  21(c),  21(d).  Both  the 
MPBNE  strategy  attack  probability,  denoted  as  and  the  PBE  strategy  attack  probability  in  Eigure  21 
will  increase  with  smaller  attack  gain  and  attack  success  rate,  as  well  as  larger  channel  error  rate  and  false 
alarm  rate.  Moreover,  it  is  noted  that  Pm  is  smaller  than  what  node  j  believes  it  would  be  (denoted  as  pj^  in 
the  Eigures  26(b)).  In  addition,  p^  is  larger  than  the  PBE  strategy  attack  probability  Ppbe  in  the  first  several 
stage  games,  however,  as  the  games  repeat,  p^  drops  below  Ppbe-  This  interesting  observation  implies  that 
when  node  i  implements  the  belief  system,  it  attacks  more  aggressively  (than  without  the  belief  about  belief 
model,  i.e.,  in  PBE)  in  the  first  several  games,  because  it  believes  node  j  is  far  from  reaching  a  successful 
detection.  As  the  game  unfolds,  node  i  adjusts  its  attack  rate  to  prevent  from  detection.  The  difference 
between  p^  and  Ppbe  also  explains  why  node  j’s  belief  system  alters  in  the  MPBNE  as  shown  in  Eigure  25. 

6.3.4  Transition  from  Detection  Game  to  Post-detection  Games 

Our  discussions  above  are  focused  on  how  the  involvement  of  node  i’s  belief  system  would  make  the  MPBNE 
different  from  the  PBE.  However,  since  the  detection  of  the  malicious  node  is  not  the  only  aim  of  this  research, 
we  are  also  motivated  to  find  the  link  between  the  MPBNE  and/or  PBE  in  the  detection  game  and  the  SPNE 
in  the  post  detection  game.  Eigure  27  shows  the  equilibrium  strategy  profiles  in  terms  of  attack  probability. 
It  is  clearly  evident  from  the  plot  that  although  in  MPBNE,  node  i  attacks  less  often  in  PBE,  in  order  to 
reach  SPNE,  node  i  still  needs  to  further  lower  its  attack  probability.  As  a  matter  of  fact,  the  post-detection 
game  is  initialized  by  node  j  when  its  belief  about  node  i’s  malice  reaches  a  threshold  value  {pj{0i)  >  0.99 
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(a)  disguise  cost  cpjcA 


(b)  detection  gain  qa/cm 

Figure  25;  Node  i’s  belief  system  update  in  the  Markov  Perfect  Bayes-Nash  Equilibrium. 

in  our  setting).  However,  this  information  is  never  revealed  to  node  i,  so  that  node  i  has  no  idea  if  the 
post-detection  game  has  started  or  not.  When  node  i  is  also  equipped  with  the  belief  system,  it  can  make 
a  prediction  on  when  the  post-detection  game  starts  based  on  its  belief  about  node  j’s  belief.  For  example, 
if  node  i’s  belief  iii{iij{9i))  >  0.99,  node  i  might  assume  that  the  post-detection  game  has  begun  and  adjust 
its  strategy  profile  accordingly. 
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Figure  25:  Node  i’s  belief  system  update  in  the  Markov  Perfect  Bayes-Nash  Equilibrium  (cont.) 
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Figure  26:  Effect  of  parameters  on  the  Markov  Perfect  Bayes-Nash  Equilibrium  strategy. 


7  Conclusions 


Misbehavior  of  nodes  is  a  big  threat  to  the  wireless  network  security  and  quality  of  service  provisioning.  To 
ensure  communication  security,  data  integrity,  and  information  fidelity,  advanced  cryptographic  techniques 
can  be  employed  at  lower  layers  in  the  network  stack.  However,  from  a  distributed  system  perspective,  it  is 
quite  desirable  that  every  node  in  the  network  can  participate  in  the  process  of  identification  and  mitigation 
of  the  misbehavior. 

This  project  addresses  mitigating  the  misbehavior  in  wireless  networks  from  a  game  theoretic  perspective. 
In  particular,  we  take  the  packet  forwarding  process  as  an  example  and  discuss  in  detail  how  to  stimulate  and 
enforce  every  node  in  the  network  to  be  cooperative.  Due  to  the  unreliable  nature  of  the  wireless  channel, 
the  cooperation  enforcement  design  covers  both  homogeneous  and  heterogeneous  channel  conditions.  In 
the  former  scenario,  we  derive  the  necessary  and  sufficient  conditions  to  achieve  cooperation  with  various 
strategies.  We  also  examine  the  cooperation  enforcement  process  with  evolutionary  game  theory  by  proposing 
an  anti-collusion  game.  Our  analysis  shows  that  any  strategy  that  resists  collusion  will  lead  to  full  cooperation 
in  the  network.  In  the  latter  scenario,  we  model  the  effect  of  heterogeneous  unreliable  channel  as  imperfect 
private  monitoring  in  the  game.  We  propose  a  state  machine  based  strategy  that  ensures  the  cooperative 
actions  are  the  sequential  equilibrium.  In  addition,  extensions  are  made  to  the  strategy  profile  so  that  it 
can  also  enforce  cooperation  in  a  multi-hop  wireless  network. 

Our  discussion  on  mitigating  the  misbehavior  continues  with  the  presence  of  malicious  attacks  in  the 
network.  Since  the  malicious  nodes  bear  the  goal  to  cause  harm  to  the  network,  it  does  not  fall  into  the 
scope  of  cooperation  enforcement  schemes.  Since  the  malicious  node  can  camouflage  as  regular  nodes  when 
it  is  not  attacking,  we  devise  a  Bayesian  detection  game  to  model  the  malice  detection  process.  As  the 
detection  process  goes,  the  regular  node  that  monitors  updates  its  belief  on  the  identity  of  the  malicious 
node.  This  problem  gets  complicated  when  we  consider  the  channel  unreliability  that  affects  the  observation 
accuracy.  We  further  study  the  even  more  complicated  problem  when  the  countermeasures  are  available 
to  the  malicious  node,  i.e.,  the  malicious  node  can  also  study  the  games  and  form  beliefs  about  whether 
its  identity  has  been  revealed.  Equilibrium  solutions  are  obtained  to  show  the  properties  of  the  detection 
process.  Unlike  other  methods  that  isolate  the  malicious  nodes  upon  detection,  we  argue  that  it  is  beneficial 
to  keep  them  in  the  network  as  long  as  they  can  be  taken  advantage  of.  Hence,  we  formulate  a  post-detection 
game  and  derive  the  equilibrium  that  enables  regular  and  malicious  nodes  to  coexist.  Moreover,  we  also 
obtain  the  expected  time  when  the  malicious  node  should  not  be  kept  any  longer. 

Our  analysis  is  backed  by  extensive  simulations.  The  experimental  study  not  only  validates  our  design 
notion,  but  also  shows  the  properties  of  our  solutions.  In  the  anti-collusion  game  with  homogeneous  unreliable 
channel,  we  illustrate  the  effects  of  initial  population  share,  channel  unreliability,  and  payoff  matrix  on  the 
convergence  of  cooperation  evolution.  Under  heterogeneous  lossy  channel  scenarios,  the  simulation  results 
illustrate  the  efficiency  of  the  proposed  state  machine  based  forwarding  strategies.  In  addition,  network 
throughput  performance  is  measured  with  respect  to  parameters  like  channel  loss  probability,  route  hop 
count  and  mobility.  Results  suggest  that  the  performance  due  to  our  proposed  strategy  is  in  close  agreement 
with  that  of  unconditionally  cooperative  nodes.  In  the  study  of  coexistence,  through  simulation,  we  show 
that  the  coexistence  equilibrium  helps  to  extend  the  length  of  the  games  and  improves  the  throughput  of 
the  network.  Experimental  results  also  indicate  that  with  the  help  of  the  proposed  nested  belief  system,  the 
malicious  node  is  able  to  adjust  its  strategy  in  the  game  and  finally  the  detection  game  and  post-detection 
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game  are  integrated  with  effective  transition. 

The  outcome  of  this  research  serves  as  a  starting  point  for  future  research  in  the  subject  of  misbehavior 
identification  and  mitigation.  For  example,  our  research  on  malicious  node  detection  game  only  considers 
single  user  detection,  therefore,  the  channel  condition  is  invariant.  In  follow-up  work,  networked  detection 
scheme  can  be  designed  such  that  every  regular  node  in  the  network  can  participate  in  the  detection  pro¬ 
cess  and  share  their  observations.  In  a  networked  scenario,  the  channel  conditions  are  heterogeneous  and 
cooperation  among  all  regular  nodes  are  desirable.  From  this  perspective,  the  findings  in  section  4  might  be 
applicable.  Nonetheless,  many  issues  remain  open,  e.g.,  the  channel  unreliability  effects  on  the  information 
exchange  among  regular  nodes,  the  topology  dynamics  when  the  nodes  are  mobile,  and  unbalanced  data 
flows.  Furthermore,  the  solutions  proposed  in  the  research  and  their  properties  can  provide  insights  to  other 
similar  security  methods,  because  in  our  approach,  we  focus  on  the  perspective  of  individual  nodes  and  the 
solutions  obtained  herein  reach  the  equilibrium  in  the  network. 
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